Page MenuHomeSoftware Heritage

Cpan: List Perl module origins from cpan.org
ClosedPublic

Authored by franckbret on Sep 27 2022, 9:00 AM.

Details

Diff Detail

Repository
rDLS Listers
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D8542 (id=30802)

Rebasing onto fd1a4244a0...

Current branch diff-target is up to date.
Changes applied before test
commit 8f56a4ad1ca32135f2d73b69e1680fcb75454f07
Author: Franck Bret <franck.bret@octobus.net>
Date:   Tue Sep 27 08:57:48 2022 +0200

    Cpan: List Perl module origins from cpan.org
    
    Related T2833

Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/704/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/704/console

Harbormaster returned this revision to the author for changes because remote builds failed.Sep 27 2022, 9:05 AM
Harbormaster failed remote builds in B31779: Diff 30802!

Fix docstring bad target name

Build has FAILED

Patch application report for D8542 (id=30804)

Rebasing onto fd1a4244a0...

Current branch diff-target is up to date.
Changes applied before test
commit f86ee10dd484812b32270cfbc55da8c8e3400b7c
Author: Franck Bret <franck.bret@octobus.net>
Date:   Tue Sep 27 08:57:48 2022 +0200

    Cpan: List Perl module origins from cpan.org
    
    Related T2833

Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/705/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/705/console

Harbormaster returned this revision to the author for changes because remote builds failed.Sep 27 2022, 9:24 AM
Harbormaster failed remote builds in B31781: Diff 30804!

Fix docstring bad target name (forget one on previous commit)

Build is green

Patch application report for D8542 (id=30805)

Rebasing onto fd1a4244a0...

Current branch diff-target is up to date.
Changes applied before test
commit c6cf96b718a3d3a352e15e3fc0d7057b044a9a94
Author: Franck Bret <franck.bret@octobus.net>
Date:   Tue Sep 27 08:57:48 2022 +0200

    Cpan: List Perl module origins from cpan.org
    
    Related T2833

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/706/ for more details.

This revision is now accepted and ready to land.Sep 27 2022, 11:00 AM

@franckbret , have you considered exploiting the https://fastapi.metacpan.org/v1/release/_search endpoint of the CPAN elasticsearch ?

It seems to list all CPAN releases with dates, links to tarballs and checksums. You could build a list of artifacts for each package as in the crates loader
and pass them as loader arguments.

This revision was landed with ongoing or failed builds.Sep 27 2022, 2:32 PM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D8542 (id=30825)

Rebasing onto 6696a8424a...

Current branch diff-target is up to date.
Changes applied before test
commit a4aec3894e3c08f9eb88f20432b893137399b307
Author: Franck Bret <franck.bret@octobus.net>
Date:   Tue Sep 27 08:57:48 2022 +0200

    Cpan: List Perl module origins from cpan.org
    
    Related T2833

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/708/ for more details.

@franckbret , have you considered exploiting the https://fastapi.metacpan.org/v1/release/_search endpoint of the CPAN elasticsearch ?

It seems to list all CPAN releases with dates, links to tarballs and checksums. You could build a list of artifacts for each package as in the crates loader
and pass them as loader arguments.

Thanks for the review.

Yes I have checked this one too. When exploring CPAN I found a lot of inconsistencies when parsing data, so i've gone a simplier way.
The main cons is that /release/ returns a lot of useless data and a big amount of records, the idea is that the loader will load related versions via https://fastapi.metacpan.org/v1/release/versions/{pkgname}

Let's talk about this next week.

@franckbret , have you considered exploiting the https://fastapi.metacpan.org/v1/release/_search endpoint of the CPAN elasticsearch ?

It seems to list all CPAN releases with dates, links to tarballs and checksums. You could build a list of artifacts for each package as in the crates loader
and pass them as loader arguments.

Thanks for the review.

Yes I have checked this one too. When exploring CPAN I found a lot of inconsistencies when parsing data, so i've gone a simplier way.
The main cons is that /release/ returns a lot of useless data and a big amount of records, the idea is that the loader will load related versions via https://fastapi.metacpan.org/v1/release/versions/{pkgname}

Let's talk about this next week.

The https://fastapi.metacpan.org/v1/release/versions/{pkgname} is missing tarball checksums so that is an issue for the loader, plus
we are missing last_update date in lister output.

You can filter the fields returned by the https://fastapi.metacpan.org/v1/release/_search endpoint to only returns what we are interested are,
see https://fastapi.metacpan.org/v1/release/_search?fields=download_url,checksum_sha256,distribution,date,version&size=1000&scroll=1m for instance.
We then need to scroll all results to get all package releases.