Page MenuHomeSoftware Heritage

loader git: load revisions in topological order
Closed, MigratedEdits Locked

Description

Right now, the git loader loads revision objects in the order that they come in the packfile sent by upstream. This means that if the loader is interrupted, for whatever reason, in the middle of adding revisions, we will have loaded a set of revisions with no guarantees that their parents have been properly added.

This means that we cannot use a simple global lookup of revision objects to reduce the effort of loading "undeclared" forks: we cannot consider that any revision currently present in the SWH archive is complete.

If we can ensure that existing revision objects have been properly loaded, including their parents, making the git loader sort revision objects in topological order before adding them would allow us to use a global lookup of revision objects to reduce the size of packfiles received from the server (instead of restricting ourselves to earlier snapshots of the same origin).

Event Timeline

olasd created this task.
olasd removed a parent task: T3653: Stabilize loader git.

(I've removed T3653 as parent as this is a somewhat longer term endeavour. Not the topological sorting itself, but making sure that (most) existing revisions aren't dangling, before we can use this topological guarantee)

effort : medium

  • this ensures that objects loaded in the archive are self-consistent
  • but this increases the processing needed to load git repositories (i.e. it will slow them down)