Page MenuHomeSoftware Heritage

Use "fork" relationships to speed-up initial load of large repositories
Open, LowPublic

Description

(I'm writing this task just so that I don't forget the idea, but I don't expect it to be actionable in the short term)

To work incrementally, VCS loaders fetch the last snapshot of the origin, which gives them a set of "heads", they can pass to origins, so origins will detect what revisions they don't need to send.

Unfortunately, when someone forks a large repository (such as https://github.com/chromium/chromium) and we see it for the first time, we don't have that snapshot; so the server needs to send all revisions, and we then discard almost all of them, because they are already in the archive.

However, if we could detect new repositories are forks (from extrinsic metadata, from heuristics based on repository names, ...), we could fetch the snapshot from the original repositories and use them as the base to load the fork incrementally

Related Objects

StatusAssignedTask
Resolvedardumont
Openvlorentz
OpenNone
Resolvedolasd
ResolvedNone
Resolvedardumont
Resolvedvlorentz
Resolvedvlorentz
Resolvedvlorentz
Resolvedvlorentz
Resolvedvlorentz
Resolvedvlorentz
Resolvedvlorentz
Resolvedvlorentz
OpenNone
Resolvedvlorentz
Resolvedvlorentz
Resolvedvlorentz
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont

Event Timeline

vlorentz triaged this task as Normal priority.Apr 19 2021, 1:49 PM
vlorentz lowered the priority of this task from Normal to Low.
vlorentz created this task.