- improve Actual Grafana Dashboard
- How To quantify the lag ? >> which metrics
- Already available :
- number of origins never visited
- number of origins with known changes
- sum of the two previous numbers: number of origins in the loading queue
- simple to add in the current way we generate the metrics (swh.scheduler.update_metrics)
- earliest origin that we know and have not loaded: coarse grain lag estimator `min(first_seen) where last_visit is null`
- last listing date (are the listers working properly?) `max(last_seen)`
- could be added in the scheduler journal client
- histogram of first-listing-to-first-archival duration measurements: how much time did origins spend in the archival queue?
- need adaptations in the swh.scheduler model, and analysis of whether the forges provide the information at all
- histogram of creation-to-first-archival duration measurements: how old (date of creation in the forge) are the origins we're archiving now?
- needs a new field "creation time" (!= first listing time) in the lister table
- histogram of creation-to-first-archival duration measurements: how old (date of creation in the forge) are the origins we're archiving now?
- Already available :
Description
Description
Status | Assigned | Task | ||
---|---|---|---|---|
Migrated | gitlab-migration | T4080 Minimize archival lag w.r.t. upstream code hosting platforms | ||
Migrated | gitlab-migration | T4130 Quantify and monitor in real-time the lag, especially for major platforms |