Page MenuHomeSoftware Heritage

migrate_extrinsic_metadata: add support for guessing the origin of more PyPI packages from filenames.
ClosedPublic

Authored by vlorentz on Sep 18 2020, 5:50 PM.

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D3998 (id=14090)

Rebasing onto b0027abc34...

Current branch diff-target is up to date.
Changes applied before test
commit 801301a037d8befeb1a82e4695cd3e0149f6becd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Sep 18 17:50:26 2020 +0200

    migrate_extrinsic_metadata: add support for guessing the origin of more PyPI packages from filenames.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/952/ for more details.

I'm not sure I understand what's a stake here, but it really looks odd.

@douardda Older versions of the PyPI loader didn't specify the origin in the extrinsic metadata, so we have to guess them to set the origin context on their metadata.

Filenames are the best way I found, but they include the human-specified version, which can have any format, so it's a lot of guess work to tell the package name (used in the origin computation) and version (useless) appart.

ardumont added a subscriber: ardumont.

The stake is being able to migrate very old metadata from the revision the pypi
loader used to make into something relatively decent and correct into the metadata storage...

As pypi/npm/... does not really enforce consistency in project naming, this is kinda inconsistent...

I don't think we can do better than what val suggests here (if we ever want to
migrate that data that is ¯\_(ツ)_/¯ )

so lgtm

swh/storage/migrate_extrinsic_metadata.py
120

you missed the datahaven entry in the tests.

This revision is now accepted and ready to land.Sep 23 2020, 2:33 PM
This revision was landed with ongoing or failed builds.Sep 26 2020, 8:05 AM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D3998 (id=14304)

Rebasing onto 0adb8fc387...

Current branch diff-target is up to date.
Changes applied before test
commit c812c79a739c27f0f9cd9dd08e21261fb8c59b80
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Sep 18 17:50:26 2020 +0200

    migrate_extrinsic_metadata: add support for guessing the origin of more PyPI packages from filenames.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/984/ for more details.