Page MenuHomeSoftware Heritage

loader git: enable global deduplication of head branches before fetching them
Open, NormalPublic

Description

This task tracks the efforts to (re-)enable global deduplication of revisions in the git loader, to reduce the amount of data downloaded from upstreams (and converted uselessly by workers).

  • first enabling partial global deduplication through extid mappings for snapshot heads (for which we know that we have done a complete load of the history): T3635
  • then surveying the opportunity of "just" doing a global lookup for any object types: T3656, and T3654 to avoid creating new "history holes"