Page MenuHomeSoftware Heritage

migrate_extrinsic_metadata: guess PyPI origins.
ClosedPublic

Authored by vlorentz on Sep 16 2020, 10:44 AM.

Details

Summary

This works by guessing the package name from the original_artifact data,
then building an origin that would match the package name, then filtering
checking if the revision can be reached from it.

Depends D3958 and D3927

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D3959 (id=13938)

Could not rebase; Attempt merge onto 3b781a8a52...

Merge made by the 'recursive' strategy.
 swh/storage/migrate_extrinsic_metadata.py          | 121 +++++--
 .../migrate_extrinsic_metadata/test_debian.py      | 301 ++++++++++++++++-
 .../tests/migrate_extrinsic_metadata/test_pypi.py  | 364 ++++++++++++++-------
 3 files changed, 638 insertions(+), 148 deletions(-)
Changes applied before test
commit c7e05eb03e659680042dd86b9b80f6d41367e041
Merge: 3b781a8a a69bb3b7
Author: Jenkins user <jenkins@localhost>
Date:   Wed Sep 16 08:48:25 2020 +0000

    Merge branch 'diff-target' into HEAD

commit a69bb3b76a6ef3225d7b74cc6d1c23112b9fee70
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 16 10:42:19 2020 +0200

    migrate_extrinsic_metadata: guess PyPI origins.
    
    This works by guessing the package name from the original_artifact data,
    then building an origin that would match the package name, then filtering
    checking if the revision can be reached from it.

commit 8e8c7ee79a832e59a20ff894299b58024addd967
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 16 09:50:08 2020 +0200

    migrate_extrinsic_metadata.test_pypi: use the in-memory storage instead of mocks
    
    in a future commit, migrating pypi revisions will become more interactive with
    the storage, so it's easier to have a real one instead of a mock.

commit f6943400ff48ba840ab604747252ad4197fab5d9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Sep 14 10:51:48 2020 +0200

    migrate_extrinsic_metadata.test_debian: use the in-memory storage instead of mocks
    
    in tests that need to read in the storage.
    
    Using mocks just makes it more complicated, and we decided not to do that
    a while ago.

commit 7a0467972fcd0c05ff16d806b18261aba8624288
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Sep 11 14:16:21 2020 +0200

    migrate_extrinsic_metadata: fix crash on dangling branch.

commit 7969d368966c43ebfd51b2901c827217b0712dd5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Sep 11 13:59:55 2020 +0200

    migrate_extrinsic_metadata: fix crash when a Debian revision is missing.
    
    https://forge.softwareheritage.org/T997

commit 265fc387f7b3d5f1a55d136b74fa2ee9b9f11f58
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 10 14:23:12 2020 +0200

    migrate_extrinsic_metadata: guess Debian origins.
    
    This works by guessing the package name from the original_artifact data,
    then building origins that would match the package name, then filtering
    out origins by checking if the revision can be reached from them.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/934/ for more details.

ardumont added a subscriber: ardumont.

lgtm

swh/storage/tests/migrate_extrinsic_metadata/test_pypi.py
384

very*

385

or has

This revision is now accepted and ready to land.Sep 16 2020, 4:48 PM
This revision was landed with ongoing or failed builds.Sep 16 2020, 5:02 PM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D3959 (id=13974)

Rebasing onto f008a597fd...

First, rewinding head to replay your work on top of it...
Fast-forwarded diff-target to base-revision-944-D3959.
Changes applied before test

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/944/ for more details.