Page MenuHomeSoftware Heritage

Use all base snapshots in determine_wants()
ClosedPublic

Authored by vlorentz on May 13 2022, 3:23 PM.

Details

Summary

Before this commit, determine_wants() used the origin's last snapshot
if any, or the closest parent's snapshot if not.

However, we noticed that many repositories that are very slow to load
are forks that were already visited, but their owner rebased it on the
parent since the last visit, causing potentially many commits to be
added to the origin.

This ensures we do not needlessly fetch these new commits when we
already loaded the parent.

Resolves T4219#84994.

Test Plan

The existing tests cover this, because I simply replaced code that
made snapshot selection too specific.

Diff Detail

Repository
rDLDG Git loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D7831 (id=28273)

Rebasing onto 2d4bd789ad...

Current branch diff-target is up to date.
Changes applied before test
commit 9b47b24b98c21018ed433424b88ed4a6f9d84f37
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri May 13 15:12:48 2022 +0200

    Use all base snapshots in determine_wants()
    
    Before this commit, determine_wants() used the origin's last snapshot
    if any, or the closest parent's snapshot if not.
    
    However, we noticed that many repositories that are very slow to load
    are forks that were already visited, but their owner rebased it on the
    parent since the last visit, causing potentially many commits to be
    added to the origin.
    
    This ensures we do not needlessly fetch these new commits when we
    already loaded the parent.

See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/211/ for more details.

This revision is now accepted and ready to land.May 13 2022, 3:57 PM