Page MenuHomeSoftware Heritage

Only use the previous (base) snapshot as reference for incremental loads
ClosedPublic

Authored by olasd on May 27 2020, 3:17 PM.

Details

Summary

The old behavior would find any object with a matching sha1_git from the archive
to use as a reference point in the generated snapshot. While there's no known
sha1_git collisions in the wild, this behavior is inconsistent with other
loaders.

This removes the only reason to keep the object_find_by_sha1_git api in storage.

Test Plan

tox tests unchanged

Diff Detail

Repository
rDLDG Git loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

olasd created this revision.May 27 2020, 3:17 PM

Build is green

Patch application report for D3181 (id=11314)

Rebasing onto b96849a6a2...

Current branch diff-target is up to date.
Changes applied before test
commit 6d0711f562c68fc503cc4d332a8653a67921ddb4
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Wed May 27 14:58:15 2020 +0200

    Only use the previous (base) snapshot as reference for incremental loads
    
    The old behavior would find any object with a matching sha1_git from the archive
    to use as a reference point in the generated snapshot. While there's no known
    sha1_git collisions in the wild, this behavior is inconsistent with other
    loaders.
    
    This removes the only reason to keep the `object_find_by_sha1_git` api in storage.

See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/6/ for more details.

Looks good ;)

swh/loader/git/loader.py
440

curious me, how is this happening?

ardumont accepted this revision.May 27 2020, 3:37 PM

(forgot to select)

This revision is now accepted and ready to land.May 27 2020, 3:37 PM
olasd updated this revision to Diff 11319.May 27 2020, 5:24 PM

Clarify comments in get_snapshot

olasd marked an inline comment as done.May 27 2020, 5:25 PM
olasd added inline comments.
swh/loader/git/loader.py
440

This is really the crux of this function. So I've made the comments (very) verbose, I hope it's clearer now.

Build is green

Patch application report for D3181 (id=11319)

Rebasing onto b96849a6a2...

Current branch diff-target is up to date.
Changes applied before test
commit 6605ee3f047796c4b3f4b418348229fbd6beac9c
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Wed May 27 14:58:15 2020 +0200

    Only use the previous (base) snapshot as reference for incremental loads
    
    The old behavior would find any object with a matching sha1_git from the archive
    to use as a reference point in the generated snapshot. While there's no known
    sha1_git collisions in the wild, this behavior is inconsistent with other
    loaders.
    
    This removes the only reason to keep the `object_find_by_sha1_git` api in storage.

See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/11/ for more details.

ardumont added inline comments.May 27 2020, 5:28 PM
swh/loader/git/loader.py
440

yes, thanks.

ardumont added inline comments.May 27 2020, 5:37 PM
swh/loader/git/loader.py
440

thanks a lot even ;)

ardumont accepted this revision.May 27 2020, 5:37 PM
This revision was automatically updated to reflect the committed changes.
olasd marked an inline comment as done.