Page MenuHomeSoftware Heritage

package.loader: Lookup packages from the ExtID storage
ClosedPublic

Authored by vlorentz on Mar 25 2021, 7:22 PM.

Details

Summary

To check which packages are already downloaded.

For now, this lookup is done in addition to checking the artifacts
from the last snapshot's revisions' metadata, because we did not start
writing ExtIDs yet.
But the ExtID lookup will eventually replace the artifact-based lookup.

This will finally allow us to drop the 'metadata' field of Revision
objects.

Depends on D5348

Resolves T3140

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D5349 (id=19161)

Could not rebase; Attempt merge onto 6d3545e47b...

Updating 6d3545e..effa790
Fast-forward
 .pre-commit-config.yaml                          |   1 +
 swh/loader/package/debian/loader.py              |   4 +-
 swh/loader/package/debian/tests/test_debian.py   |  36 ++--
 swh/loader/package/loader.py                     | 184 +++++++++++++----
 swh/loader/package/nixguix/tests/test_nixguix.py |   6 +-
 swh/loader/package/tests/test_loader.py          | 247 +++++++++++++++++++++--
 6 files changed, 409 insertions(+), 69 deletions(-)
Changes applied before test
commit effa79036ef92ebba38fd6907292f79085701a2f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Mar 25 19:17:17 2021 +0100

    package.loader: Lookup packages from the ExtID storage
    
    To check which packages are already downloaded.
    
    For now, this lookup is done in addition to checking the artifacts
    from the last snapshot's revisions' metadata, because we did not start
    writing ExtIDs yet.
    But the ExtID lookup will eventually replace the artifact-based lookup.
    
    This will finally allow us to drop the 'metadata' field of Revision
    objects.

commit 9ad94ddc3d99f9d702ae6411e0723c0080548c82
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Mar 25 16:19:45 2021 +0100

    test_resolve_revision_from_artifacts: Use the right type for PartialExtID.
    
    We used a string instead of a tuple. It doesn't matter much because they
    are only compared with each other, but let's not intentionally use
    the wrong types when we don't need to.

commit 6e04a21df73e3d208b50159e78b52b1d6443ea83
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Mar 25 14:52:37 2021 +0100

    test_loader: Use a spec'ed mock, to follow the new style guide

commit 2f9b8a8b1eecb578cdcbe9046262b6087a6c15de
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Mar 25 10:47:03 2021 +0100

    Rename resolve_revision_from with _artifacts suffix, and fix its documentation
    
    A future commit will introduce resolve_revision_from_extid, so this commit
    preemptively renames it to avoid any confusion.

commit 78430078cd2647e09cc9a0c09a9e6179d4a0d8b1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Mar 23 16:14:42 2021 +0100

    package.loader: Unnest loops in PackageLoader.load()
    
    In a future commit, we will need to go through all the PackageInfo
    objects before running the loop, so we can get their ExtID and
    fetch them from the storage.
    
    So, we need to fetch them all before running the load loop,
    using this listcomp.

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/443/ for more details.

Resolves T3140

Note that it's only true for the package loaders...
We also need to do it for other (d)vcs loaders (mercurial, svn, git)
prior to resolving the task.

Did not read entirely the tests yet... some remarks inline.

swh/loader/package/loader.py
293–296
356
368

Log this one as warning (or whatever is the right logging level).

Build is green

Patch application report for D5349 (id=19168)

Could not rebase; Attempt merge onto 2f9b8a8b1e...

Updating 2f9b8a8..2ab982d
Fast-forward
 .pre-commit-config.yaml                 |   1 +
 swh/loader/package/loader.py            | 118 +++++++++++++++-
 swh/loader/package/tests/test_loader.py | 232 ++++++++++++++++++++++++++++++--
 3 files changed, 338 insertions(+), 13 deletions(-)
Changes applied before test
commit 2ab982dbbc669eee8a4969e53fe80d4fe4678ce6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Mar 25 19:17:17 2021 +0100

    package.loader: Lookup packages from the ExtID storage
    
    To check which packages are already downloaded.
    
    For now, this lookup is done in addition to checking the artifacts
    from the last snapshot's revisions' metadata, because we did not start
    writing ExtIDs yet.
    But the ExtID lookup will eventually replace the artifact-based lookup.
    
    This will finally allow us to drop the 'metadata' field of Revision
    objects.

commit e590b4258262e78449d84724e25824af07b96c47
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Mar 25 16:19:45 2021 +0100

    test_resolve_revision_from_artifacts: Use the right type for PartialExtID.
    
    We used a string instead of a tuple. It doesn't matter much because they
    are only compared with each other, but let's not intentionally use
    the wrong types when we don't need to.

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/445/ for more details.

Build is green

Patch application report for D5349 (id=19172)

Rebasing onto e590b42582...

Current branch diff-target is up to date.
Changes applied before test
commit 0898142247e3ab5ef2d1a85994f96bc11c05b28e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Mar 25 19:17:17 2021 +0100

    package.loader: Lookup packages from the ExtID storage
    
    To check which packages are already downloaded.
    
    For now, this lookup is done in addition to checking the artifacts
    from the last snapshot's revisions' metadata, because we did not start
    writing ExtIDs yet.
    But the ExtID lookup will eventually replace the artifact-based lookup.
    
    This will finally allow us to drop the 'metadata' field of Revision
    objects.

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/446/ for more details.

Build is green

Patch application report for D5349 (id=19173)

Rebasing onto e590b42582...

Current branch diff-target is up to date.
Changes applied before test
commit 0ca0b0df46cc7c3de29c8563882d52cdef88edf7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Mar 25 19:17:17 2021 +0100

    package.loader: Lookup packages from the ExtID storage
    
    To check which packages are already downloaded.
    
    For now, this lookup is done in addition to checking the artifacts
    from the last snapshot's revisions' metadata, because we did not start
    writing ExtIDs yet.
    But the ExtID lookup will eventually replace the artifact-based lookup.
    
    This will finally allow us to drop the 'metadata' field of Revision
    objects.

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/447/ for more details.

swh/loader/package/tests/test_loader.py
140

you kept that pattern here :)

160

ok, re-read the code indeed, you whitelist only the revisions from the previous snapshot.

Is that only revision that are targetted by snapshot though?

nvm, you said so earlier "for now, yes".

Resolves T3140

Note that it's only true for the package loaders...
We also need to do it for other (d)vcs loaders (mercurial, svn, git)
prior to resolving the task.

Have you noticed my first remark about ^?


Nonetheless, lgtm.

I would not resolve the task just yet though (or rename the task to specifiically address the package loaders).

This revision is now accepted and ready to land.Mar 26 2021, 11:40 AM
vlorentz added inline comments.
swh/loader/package/tests/test_loader.py
160

what?

remove return_extid pattern

swh/loader/package/tests/test_loader.py
160
# We only support revisions for now.
# Note that this case should never be reached unless there is a
# collision between a revision hash and some non-revision object's
# hash, but better safe than sorry.

Have you noticed my first remark about ^?

yes

This revision was landed with ongoing or failed builds.Mar 26 2021, 11:44 AM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D5349 (id=19174)

Rebasing onto e590b42582...

Current branch diff-target is up to date.
Changes applied before test
commit a32f687151025289568fd0f410cf93df6cb304e0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Mar 25 19:17:17 2021 +0100

    package.loader: Lookup packages from the ExtID storage
    
    To check which packages are already downloaded.
    
    For now, this lookup is done in addition to checking the artifacts
    from the last snapshot's revisions' metadata, because we did not start
    writing ExtIDs yet.
    But the ExtID lookup will eventually replace the artifact-based lookup.
    
    This will finally allow us to drop the 'metadata' field of Revision
    objects.

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/448/ for more details.