Quantify and monitor in real-time the lag, especially for major platforms
  • improve Actual Grafana Dashboard
  • How To quantify the lag ? >> which metrics
    • Already available :
      • number of origins never visited
      • number of origins with known changes
      • sum of the two previous numbers: number of origins in the loading queue
    • simple to add in the current way we generate the metrics (swh.scheduler.update_metrics)
      • earliest origin that we know and have not loaded: coarse grain lag estimator `min(first_seen) where last_visit is null`
      • last listing date (are the listers working properly?) `max(last_seen)`
    • could be added in the scheduler journal client
      • histogram of first-listing-to-first-archival duration measurements: how much time did origins spend in the archival queue?
    • need adaptations in the swh.scheduler model, and analysis of whether the forges provide the information at all
      • histogram of creation-to-first-archival duration measurements: how old (date of creation in the forge) are the origins we're archiving now?
        • needs a new field "creation time" (!= first listing time) in the lister table