Page MenuHomeSoftware Heritage

cpan: Collect extrinsic metadata for each module release
ClosedPublic

Authored by anlambert on Oct 10 2022, 5:04 PM.

Details

Summary

Fetch extrinsic metadata from the URLs provided by the lister and store
them as release extrinsic metadata.

Also store origin artifacts JSON data provided by the lister as release
extrinsic metadata.

Related to T2833

Depends on D8651

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Branch
cpan-extrinsic-metadata
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 32216
Build 50459: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 50458: arc lint + arc unit

Unit TestsFailed

TimeTest
592 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.loader.package.cpan.tests.test_cpan::test_cpan_loader_load_multiple_version
cpan_loader = <swh.loader.package.cpan.loader.CpanLoader object at 0x7f6195efcfd0> head_release_original_artifacts_metadata = b'[{"url": "https://cpan.metacpan.org/authors/id/J/JJ/JJORE/Internals-CountObjects-0.05.tar.gz", "filename": "CountObj....tar.gz", "length": 632, "checksums": {"sha256": "e0ecf6ab4873fa55ff74da22a3c4ae0ab6a1409635c9cd2d6059abbb32be3a6a"}}]' head_release_extrinsic_metadata = b'{\n "release" : {\n "provides" : "Internals::CountObjects",\n "distribution" : "Internals-CountObjects",...a256" : "bbf65021207a7a51c8f8475bc25c4735f49d62744a75d33595e9720731b2b02f"\n },\n "took" : 2,\n "total" : 1\n}\n'
2 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.loader.core.tests.test_converters::test_content_for_storage_data
4 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.loader.core.tests.test_converters::test_content_for_storage_path
1 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.loader.core.tests.test_converters::test_content_for_storage_too_long
1 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.loader.core.tests.test_converters::test_prepare_contents
View Full Test Results (1 Failed · 273 Passed)

Event Timeline

Build has FAILED

Patch application report for D8652 (id=31236)

Could not rebase; Attempt merge onto 028b7c04b9...

Updating 028b7c0..a38d40a
Fast-forward
 swh/loader/package/cpan/loader.py                  | 175 +++++++++-----------
 swh/loader/package/cpan/tests/data/fake_cpan.sh    |  86 ----------
 .../v1_release_JJORE_Internals-CountObjects-0.01   |  89 +++++++++++
 .../v1_release_JJORE_Internals-CountObjects-0.05   | 109 +++++++++++++
 .../v1_release_versions_Internals-CountObjects     |  26 ---
 swh/loader/package/cpan/tests/test_cpan.py         | 176 +++++++++++++++++----
 swh/loader/package/cpan/tests/test_tasks.py        |   8 +-
 7 files changed, 427 insertions(+), 242 deletions(-)
 delete mode 100644 swh/loader/package/cpan/tests/data/fake_cpan.sh
 create mode 100644 swh/loader/package/cpan/tests/data/https_fastapi.metacpan.org/v1_release_JJORE_Internals-CountObjects-0.01
 create mode 100644 swh/loader/package/cpan/tests/data/https_fastapi.metacpan.org/v1_release_JJORE_Internals-CountObjects-0.05
 delete mode 100644 swh/loader/package/cpan/tests/data/https_fastapi.metacpan.org/v1_release_versions_Internals-CountObjects
Changes applied before test
commit a38d40a6c5dbdee0db0b1d889c36f30d5451bcc9
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Oct 10 16:51:23 2022 +0200

    cpan: Collect extrinsic metadata for each module release
    
    Fetch extrinsic metadata from the URLs provided by the lister and store
    them as release extrinsic metadata.
    
    Also store origin artifacts JSON data provided by the lister as release
    extrinsic metadata.
    
    Related to T2833

commit 3028b7894270e2e0fd67c49afba44bd03fdb1e20
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Oct 10 13:32:05 2022 +0200

    cpan: Do not parse intrinsic metadata for getting module author
    
    Parsing perl module metadata files trigger a lot of errors due to badly
    formatted JSON or YAML and module author info is already provided by
    the cpan lister as extra loader arguments so remove that no longer
    needed metadata parsing step.
    
    Related to T2833

commit 819f9d2702c193497a9ed99b17d58192aeb4ab9b
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Thu Sep 29 20:36:38 2022 +0200

    cpan: Align loader implementation with latest lister improvements
    
    Artifacts info for a package are now provided as loader arguments so
    no need to query metacpan Web API anymore to get list of versions
    and their related info.
    
    Related to T2833

commit e53a1e17aad238e38690a8a42b3672d106e179ae
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Fri Oct 7 15:33:35 2022 +0200

    cpan: Remove module description from release message
    
    Module description is not related to a particular release so we
    should not add it in release message.

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/984/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/984/console

Harbormaster returned this revision to the author for changes because remote builds failed.Oct 10 2022, 5:08 PM
Harbormaster failed remote builds in B32189: Diff 31236!

Build has FAILED

Patch application report for D8652 (id=31257)

Could not rebase; Attempt merge onto 4cb85e153e...

Updating 4cb85e1..794cbe3
Fast-forward
 swh/loader/package/cpan/loader.py                  | 182 ++++++++++-----------
 swh/loader/package/cpan/tests/data/fake_cpan.sh    |  86 ----------
 .../v1_release_JJORE_Internals-CountObjects-0.01   |  89 ++++++++++
 .../v1_release_JJORE_Internals-CountObjects-0.05   | 109 ++++++++++++
 .../v1_release_versions_Internals-CountObjects     |  26 ---
 swh/loader/package/cpan/tests/test_cpan.py         | 179 ++++++++++++++++----
 swh/loader/package/cpan/tests/test_tasks.py        |  14 +-
 7 files changed, 443 insertions(+), 242 deletions(-)
 delete mode 100644 swh/loader/package/cpan/tests/data/fake_cpan.sh
 create mode 100644 swh/loader/package/cpan/tests/data/https_fastapi.metacpan.org/v1_release_JJORE_Internals-CountObjects-0.01
 create mode 100644 swh/loader/package/cpan/tests/data/https_fastapi.metacpan.org/v1_release_JJORE_Internals-CountObjects-0.05
 delete mode 100644 swh/loader/package/cpan/tests/data/https_fastapi.metacpan.org/v1_release_versions_Internals-CountObjects
Changes applied before test
commit 794cbe3e4b16af886384c4c1cb7a3bcba80d81be
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Oct 10 16:51:23 2022 +0200

    cpan: Collect extrinsic metadata for each module release
    
    Fetch extrinsic metadata by computing URLs from the metadata provided
    by the lister and store them as release extrinsic metadata.
    
    Also store origin artifacts JSON data provided by the lister as release
    extrinsic metadata.
    
    Related to T2833

commit 7b929606a78f38b48ffc6b966d74bc0d7aea8ce3
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Oct 10 13:32:05 2022 +0200

    cpan: Do not parse intrinsic metadata for getting module author
    
    Parsing perl module metadata files trigger a lot of errors due to badly
    formatted JSON or YAML and module author info is already provided by
    the cpan lister as extra loader arguments so remove that no longer
    needed metadata parsing step.
    
    Related to T2833

commit a13e3e6f35bcabf856664ad7f116b17ca5a3daaf
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Thu Sep 29 20:36:38 2022 +0200

    cpan: Align loader implementation with latest lister improvements
    
    Artifacts info for a package are now provided as loader arguments so
    no need to query metacpan Web API anymore to get list of versions
    and their related info.
    
    Related to T2833

commit e17ee9e08e84105710852bcc32c81bf149e55d4c
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Fri Oct 7 15:33:35 2022 +0200

    cpan: Remove module description from release message
    
    Module description is not related to a particular release so we
    should not add it in release message.

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/992/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/992/console

Harbormaster returned this revision to the author for changes because remote builds failed.Oct 11 2022, 2:45 PM
Harbormaster failed remote builds in B32209: Diff 31257!

Temporarily add pytest verbose output to debug test failure on jenkins

Build has FAILED

Patch application report for D8652 (id=31264)

Could not rebase; Attempt merge onto 4cb85e153e...

Updating 4cb85e1..1544a78
Fast-forward
 swh/loader/package/cpan/loader.py                  | 182 ++++++++++-----------
 swh/loader/package/cpan/tests/data/fake_cpan.sh    |  86 ----------
 .../v1_release_JJORE_Internals-CountObjects-0.01   |  89 ++++++++++
 .../v1_release_JJORE_Internals-CountObjects-0.05   | 109 ++++++++++++
 .../v1_release_versions_Internals-CountObjects     |  26 ---
 swh/loader/package/cpan/tests/test_cpan.py         | 179 ++++++++++++++++----
 swh/loader/package/cpan/tests/test_tasks.py        |  14 +-
 tox.ini                                            |   2 +-
 8 files changed, 444 insertions(+), 243 deletions(-)
 delete mode 100644 swh/loader/package/cpan/tests/data/fake_cpan.sh
 create mode 100644 swh/loader/package/cpan/tests/data/https_fastapi.metacpan.org/v1_release_JJORE_Internals-CountObjects-0.01
 create mode 100644 swh/loader/package/cpan/tests/data/https_fastapi.metacpan.org/v1_release_JJORE_Internals-CountObjects-0.05
 delete mode 100644 swh/loader/package/cpan/tests/data/https_fastapi.metacpan.org/v1_release_versions_Internals-CountObjects
Changes applied before test
commit 1544a789d52a9b24f18b6e4d66c043d88e9eddea
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Oct 10 16:51:23 2022 +0200

    cpan: Collect extrinsic metadata for each module release
    
    Fetch extrinsic metadata by computing URLs from the metadata provided
    by the lister and store them as release extrinsic metadata.
    
    Also store origin artifacts JSON data provided by the lister as release
    extrinsic metadata.
    
    Related to T2833

commit 7b929606a78f38b48ffc6b966d74bc0d7aea8ce3
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Oct 10 13:32:05 2022 +0200

    cpan: Do not parse intrinsic metadata for getting module author
    
    Parsing perl module metadata files trigger a lot of errors due to badly
    formatted JSON or YAML and module author info is already provided by
    the cpan lister as extra loader arguments so remove that no longer
    needed metadata parsing step.
    
    Related to T2833

commit a13e3e6f35bcabf856664ad7f116b17ca5a3daaf
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Thu Sep 29 20:36:38 2022 +0200

    cpan: Align loader implementation with latest lister improvements
    
    Artifacts info for a package are now provided as loader arguments so
    no need to query metacpan Web API anymore to get list of versions
    and their related info.
    
    Related to T2833

commit e17ee9e08e84105710852bcc32c81bf149e55d4c
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Fri Oct 7 15:33:35 2022 +0200

    cpan: Remove module description from release message
    
    Module description is not related to a particular release so we
    should not add it in release message.

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/993/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/993/console

Harbormaster returned this revision to the author for changes because remote builds failed.Oct 11 2022, 3:45 PM
Harbormaster failed remote builds in B32216: Diff 31264!

Remove original artifacts extrinsic metadata as it is redundant with
what the base package loader is doing.

Build is green

Patch application report for D8652 (id=31267)

Could not rebase; Attempt merge onto 4cb85e153e...

Updating 4cb85e1..3f1da84
Fast-forward
 swh/loader/package/cpan/loader.py                  | 178 ++++++++++-----------
 swh/loader/package/cpan/tests/data/fake_cpan.sh    |  86 ----------
 .../v1_release_JJORE_Internals-CountObjects-0.01   |  89 +++++++++++
 .../v1_release_JJORE_Internals-CountObjects-0.05   | 109 +++++++++++++
 .../v1_release_versions_Internals-CountObjects     |  26 ---
 swh/loader/package/cpan/tests/test_cpan.py         | 166 +++++++++++++++----
 swh/loader/package/cpan/tests/test_tasks.py        |  14 +-
 7 files changed, 425 insertions(+), 243 deletions(-)
 delete mode 100644 swh/loader/package/cpan/tests/data/fake_cpan.sh
 create mode 100644 swh/loader/package/cpan/tests/data/https_fastapi.metacpan.org/v1_release_JJORE_Internals-CountObjects-0.01
 create mode 100644 swh/loader/package/cpan/tests/data/https_fastapi.metacpan.org/v1_release_JJORE_Internals-CountObjects-0.05
 delete mode 100644 swh/loader/package/cpan/tests/data/https_fastapi.metacpan.org/v1_release_versions_Internals-CountObjects
Changes applied before test
commit 3f1da84f426dba2ea5f1823a2a564b1e82d738ce
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Oct 10 16:51:23 2022 +0200

    cpan: Collect extrinsic metadata for each module release
    
    Fetch extrinsic metadata by computing URLs from the metadata provided
    by the lister and store them as release extrinsic metadata.
    
    Also store origin artifacts JSON data provided by the lister as release
    extrinsic metadata.
    
    Related to T2833

commit 7b929606a78f38b48ffc6b966d74bc0d7aea8ce3
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Oct 10 13:32:05 2022 +0200

    cpan: Do not parse intrinsic metadata for getting module author
    
    Parsing perl module metadata files trigger a lot of errors due to badly
    formatted JSON or YAML and module author info is already provided by
    the cpan lister as extra loader arguments so remove that no longer
    needed metadata parsing step.
    
    Related to T2833

commit a13e3e6f35bcabf856664ad7f116b17ca5a3daaf
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Thu Sep 29 20:36:38 2022 +0200

    cpan: Align loader implementation with latest lister improvements
    
    Artifacts info for a package are now provided as loader arguments so
    no need to query metacpan Web API anymore to get list of versions
    and their related info.
    
    Related to T2833

commit e17ee9e08e84105710852bcc32c81bf149e55d4c
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Fri Oct 7 15:33:35 2022 +0200

    cpan: Remove module description from release message
    
    Module description is not related to a particular release so we
    should not add it in release message.

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/994/ for more details.

vlorentz added inline comments.
swh/loader/package/cpan/loader.py
146

Shouldn't it be cpan-release-json?

Update: s/cpan-module-json/cpan-release-json/

Build is green

Patch application report for D8652 (id=31371)

Could not rebase; Attempt merge onto a13e3e6f35...

Updating a13e3e6..8596331
Fast-forward
 swh/loader/package/cpan/loader.py                  |  99 ++++++++++---------
 .../v1_release_JJORE_Internals-CountObjects-0.01   |  89 +++++++++++++++++
 .../v1_release_JJORE_Internals-CountObjects-0.05   | 109 +++++++++++++++++++++
 swh/loader/package/cpan/tests/test_cpan.py         |  55 ++++++++++-
 4 files changed, 302 insertions(+), 50 deletions(-)
 create mode 100644 swh/loader/package/cpan/tests/data/https_fastapi.metacpan.org/v1_release_JJORE_Internals-CountObjects-0.01
 create mode 100644 swh/loader/package/cpan/tests/data/https_fastapi.metacpan.org/v1_release_JJORE_Internals-CountObjects-0.05
Changes applied before test
commit 85963318aab6ab530b059567f6ec8872689fc669
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Oct 10 16:51:23 2022 +0200

    cpan: Collect extrinsic metadata for each module release
    
    Fetch extrinsic metadata by computing URLs from the metadata provided
    by the lister and store them as release extrinsic metadata.
    
    Related to T2833

commit 7b929606a78f38b48ffc6b966d74bc0d7aea8ce3
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Oct 10 13:32:05 2022 +0200

    cpan: Do not parse intrinsic metadata for getting module author
    
    Parsing perl module metadata files trigger a lot of errors due to badly
    formatted JSON or YAML and module author info is already provided by
    the cpan lister as extra loader arguments so remove that no longer
    needed metadata parsing step.
    
    Related to T2833

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/999/ for more details.

This revision is now accepted and ready to land.Oct 17 2022, 2:56 PM