Page MenuHomeSoftware Heritage

Compute and show ETA for vault tasks
Closed, MigratedEdits Locked

Description

When requested object objects are present in swh-graph, we should be able to approximate the total runtime of vault tasks, as I expect it to be linear in the number of objects of each type.

How to do it:

  1. use data in swh-scheduler's database to get the run time of cooking each root object
  2. use swh-graph to compute the number of objects of each type (cnt + dir + rev should be enough) reachable from that root object
  3. run a linear regression to obtain a model of the runtime as a function of the number of object of each type
  4. store that model somewhere (vault backend? swh-web?)
  5. every time we get a cooking request, query counts in swh-graph (like in step 2) and use the model to estimate the run time

As a first approximation, we could skip steps 3 and 4, this might be good enough, as the git-bare cooker fetches objects somewhat homogeneously (ie. a batch of revs, then a batch of dirs, then a batch of contents, then revs again, ...)

This would be a great UX improvement, as some gitfast/git-bare tasks can be really long.

Event Timeline

vlorentz renamed this task from Add ETA for to vault tasks to Compute and show ETA for to vault tasks.Sep 3 2021, 3:32 PM
vlorentz renamed this task from Compute and show ETA for to vault tasks to Compute and show ETA for vault tasks.
vlorentz triaged this task as Normal priority.
vlorentz created this task.
vlorentz updated the task description. (Show Details)

(marking T2220 as dependency, because we need an up-to-date graph to show an ETA on objects loaded in the last ~year)