Currently, the following methods are available in the storage interface regarding origin visits retrieval:
- origin_visit_get(self, origin: str, page_token: Optional[str] = None, order: ListOrder = ListOrder.ASC, limit: int = 10) -> PagedResult[OriginVisit]: returns all visits of an origin in a paginated way
- origin_visit_status_get(self, origin: str, visit: int, page_token: Optional[str] = None, order: ListOrder = ListOrder.ASC, limit: int = 10) -> PagedResult[OriginVisitStatus]: return all statuses for a given visit in a paginated way
- origin_visit_status_get_latest(self, origin_url: str, visit: int, allowed_statuses: Optional[List[str]] = None, require_snapshot: bool = False) -> Optional[OriginVisitStatus]: return the latest status for a given visit
For the use case when one wants to list all visits and their latest statuses (for instance in the visits view of the webapp), the following process must be applied:
- Get all visits by calling origin_visit_get
- For each visit, get its latest status by calling origin_visit_status_get_latest
The second step is a real bottleneck when an origin has a lot of visits as we have to send a lot of queries to the storage to get their latest statuses.
We should add a new method to the storage interface to efficiently retrieve the latest statuses of origin visits.
The signature could be the following:
origin_visit_status_latest_get_all(self, origin: str, page_token: Optional[str] = None, order: ListOrder = ListOrder.ASC, limit: int = 10) -> PagedResult[OriginVisitStatus]