Page MenuHomeSoftware Heritage

Experiment with generation numbers to improve revisions walk performance
Closed, MigratedEdits Locked

Description

In order to reduce the number of commits to walk for answering the question: "Is commit B reachable from A ?",
current git version uses a simple optimization based on the association of generation numbers [1, 2, 3] to commits.
A commit's generation number is its height in the history graph, as measured from the farthest root.

This technique could also be used to increase the performance of some operations requiring to walk a revisions graph
stored in the Software Heritage archive.

That task is here to track the experiments on the subject.

[1] https://www.spinics.net/lists/git/msg161165.html
[2] https://stackoverflow.com/questions/6702821/git-commit-generation-numbers
[3] https://devblogs.microsoft.com/devops/supercharging-the-git-commit-graph-iii-generations/

Event Timeline

anlambert triaged this task as Normal priority.Apr 2 2019, 11:50 AM
anlambert created this task.

@vlorentz mentioned this idea in the context of T3655 (git loader global deduplication).

His idea is that we retrieve (what we think are) the objects that are pointed at by refs in the origin, from the archive; if these revisions have a generation number, then we know that their full history is available in SWH, and we can use them as "bases" for an incremental git packfile.