Page MenuHomeSoftware Heritage

cpan: Fix module version extraction for some edge cases
ClosedPublic

Authored by anlambert on Oct 10 2022, 4:26 PM.

Details

Summary

CPAN API can return versions that are not of str type: either
int or float.

When version equals 0, it means that version failed to be parsed
by CPAN so we try to extract it from release name in that case.

Otherwise we ensure to convert the version to str type.

Related to T2833

Depends on D8615

Diff Detail

Repository
rDLS Listers
Branch
cpan-fix-version-parsing
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 32215
Build 50457: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 50456: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D8648 (id=31231)

Could not rebase; Attempt merge onto 108816f232...

Updating 108816f..8d26db1
Fast-forward
 swh/lister/cpan/__init__.py                        |   8 +-
 swh/lister/cpan/lister.py                          | 144 ++++++++++--
 ...TU1MTQ1NjA6eXptdmszQUNUam1XbVJjRjRkRk9UdzswOw== |  50 -----
 ...NjA6eXptdmszQUNUam1XbVJjRjRkRk9UdzswOw==_visit1 |  16 --
 .../v1__search_scroll_page1                        | 247 +++++++++++++++++++++
 .../v1__search_scroll_page2                        |  39 ++++
 .../v1__search_scroll_page3                        |  85 +++++++
 .../v1__search_scroll_page4                        | 131 +++++++++++
 ...ibution__search,fields=name,size=1000,scroll=1m |  52 -----
 .../https_fastapi.metacpan.org/v1_release__search  | 246 ++++++++++++++++++++
 swh/lister/cpan/tests/test_lister.py               | 166 ++++++++++++--
 11 files changed, 1025 insertions(+), 159 deletions(-)
 delete mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll,scroll=1m,scroll_id=cXVlcnlUaGVuRmV0Y2g7Mzs5NTU1MTQ1NTk6eXptdmszQUNUam1XbVJjRjRkRk9Udzs5NTQ5NjQ5NjI6ZHZIZWxCb3BUZi1Cb3NwRDB5NmRQUTs5NTU1MTQ1NjA6eXptdmszQUNUam1XbVJjRjRkRk9UdzswOw==
 delete mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll,scroll=1m,scroll_id=cXVlcnlUaGVuRmV0Y2g7Mzs5NTU1MTQ1NTk6eXptdmszQUNUam1XbVJjRjRkRk9Udzs5NTQ5NjQ5NjI6ZHZIZWxCb3BUZi1Cb3NwRDB5NmRQUTs5NTU1MTQ1NjA6eXptdmszQUNUam1XbVJjRjRkRk9UdzswOw==_visit1
 create mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll_page1
 create mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll_page2
 create mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll_page3
 create mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll_page4
 delete mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1_distribution__search,fields=name,size=1000,scroll=1m
 create mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1_release__search
Changes applied before test
commit 8d26db1cf78bddfb005addd2bc41fdca44fc19f4
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Oct 10 15:55:54 2022 +0200

    cpan: Fix module version extraction for some edge cases
    
    CPAN API can return versions that are not of str type: either
    int or float.
    
    When version equals 0, it means that version failed to be parsed
    by CPAN so we try to extract it from release name in that case.
    
    Otherwise we ensure to convert the version to str type.
    
    Related to T2833

commit 2177ac9f5a08c2bd276f494b2aa4c8f0d4239e65
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Sep 27 16:34:38 2022 +0200

    cpan: Improve listing process by querying the metacpan release endpoint
    
    Instead of querying the metacpan distribution endpoint to list origins,
    prefer to use the release endpoint instead enabling to list all artifacts
    associated to CPAN packages by scrolling results.
    
    Compared to previous implementation, it enables to compute a last_update
    date for all CPAN packages but also to obtain artifact sha256 checksums
    that will be used by the CPAN loader to check downloads integrity.
    
    As the multiple versions of a module are spread across multiple pages
    from the CPAN API, origins are sent to the scheduler once all pages
    processed, it is also faster to proceed that way.
    
    Also compute extrinsic metadata URL for each perl module versions in
    order for the cpan loader to query it.
    
    Related to T2833

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/775/ for more details.

vlorentz added a subscriber: vlorentz.
vlorentz added inline comments.
swh/lister/cpan/lister.py
31

or module_version = release_name.replace(prefix, "", 1), to avoid accidentally replacing more than once

32–33

redundant

This revision is now accepted and ready to land.Oct 11 2022, 9:55 AM

Build has FAILED

Patch application report for D8648 (id=31253)

Could not rebase; Attempt merge onto 108816f232...

Updating 108816f..2777809
Fast-forward
 swh/lister/cpan/__init__.py                        |   8 +-
 swh/lister/cpan/lister.py                          | 158 ++++++++++---
 ...TU1MTQ1NjA6eXptdmszQUNUam1XbVJjRjRkRk9UdzswOw== |  50 -----
 ...NjA6eXptdmszQUNUam1XbVJjRjRkRk9UdzswOw==_visit1 |  16 --
 .../v1__search_scroll_page1                        | 247 +++++++++++++++++++++
 .../v1__search_scroll_page2                        |  39 ++++
 .../v1__search_scroll_page3                        |  85 +++++++
 .../v1__search_scroll_page4                        | 131 +++++++++++
 ...ibution__search,fields=name,size=1000,scroll=1m |  52 -----
 .../https_fastapi.metacpan.org/v1_release__search  | 246 ++++++++++++++++++++
 swh/lister/cpan/tests/test_lister.py               | 165 ++++++++++++--
 11 files changed, 1037 insertions(+), 160 deletions(-)
 delete mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll,scroll=1m,scroll_id=cXVlcnlUaGVuRmV0Y2g7Mzs5NTU1MTQ1NTk6eXptdmszQUNUam1XbVJjRjRkRk9Udzs5NTQ5NjQ5NjI6ZHZIZWxCb3BUZi1Cb3NwRDB5NmRQUTs5NTU1MTQ1NjA6eXptdmszQUNUam1XbVJjRjRkRk9UdzswOw==
 delete mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll,scroll=1m,scroll_id=cXVlcnlUaGVuRmV0Y2g7Mzs5NTU1MTQ1NTk6eXptdmszQUNUam1XbVJjRjRkRk9Udzs5NTQ5NjQ5NjI6ZHZIZWxCb3BUZi1Cb3NwRDB5NmRQUTs5NTU1MTQ1NjA6eXptdmszQUNUam1XbVJjRjRkRk9UdzswOw==_visit1
 create mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll_page1
 create mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll_page2
 create mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll_page3
 create mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll_page4
 delete mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1_distribution__search,fields=name,size=1000,scroll=1m
 create mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1_release__search
Changes applied before test
commit 27778090c535fa473ea08bd6f5a9e0a491de573a
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Oct 10 15:55:54 2022 +0200

    cpan: Fix module version extraction for some edge cases
    
    CPAN API can return versions that are not of str type: either
    int or float.
    
    When version equals 0, it means that version failed to be parsed
    by CPAN so we try to extract it from release name in that case.
    
    Otherwise we ensure to convert the version to str type.
    
    Related to T2833

commit 5042a43e31c091d186a7e38c36df0235f6cd65e7
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Sep 27 16:34:38 2022 +0200

    cpan: Improve listing process by querying the metacpan release endpoint
    
    Instead of querying the metacpan distribution endpoint to list origins,
    prefer to use the release endpoint instead enabling to list all artifacts
    associated to CPAN packages by scrolling results.
    
    Compared to previous implementation, it enables to compute a last_update
    date for all CPAN packages but also to obtain artifact sha256 checksums
    that will be used by the CPAN loader to check downloads integrity.
    
    As the multiple versions of a module are spread across multiple pages
    from the CPAN API, origins are sent to the scheduler once all pages
    processed, it is also faster to proceed that way.
    
    Related to T2833

Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/778/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/778/console

Build has FAILED

Patch application report for D8648 (id=31259)

Could not rebase; Attempt merge onto 108816f232...

Updating 108816f..729d9b6
Fast-forward
 swh/lister/cpan/__init__.py                        |   8 +-
 swh/lister/cpan/lister.py                          | 158 ++++++++++---
 ...TU1MTQ1NjA6eXptdmszQUNUam1XbVJjRjRkRk9UdzswOw== |  50 -----
 ...NjA6eXptdmszQUNUam1XbVJjRjRkRk9UdzswOw==_visit1 |  16 --
 .../v1__search_scroll_page1                        | 247 +++++++++++++++++++++
 .../v1__search_scroll_page2                        |  39 ++++
 .../v1__search_scroll_page3                        |  85 +++++++
 .../v1__search_scroll_page4                        | 131 +++++++++++
 ...ibution__search,fields=name,size=1000,scroll=1m |  52 -----
 .../https_fastapi.metacpan.org/v1_release__search  | 246 ++++++++++++++++++++
 swh/lister/cpan/tests/test_lister.py               | 165 ++++++++++++--
 11 files changed, 1037 insertions(+), 160 deletions(-)
 delete mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll,scroll=1m,scroll_id=cXVlcnlUaGVuRmV0Y2g7Mzs5NTU1MTQ1NTk6eXptdmszQUNUam1XbVJjRjRkRk9Udzs5NTQ5NjQ5NjI6ZHZIZWxCb3BUZi1Cb3NwRDB5NmRQUTs5NTU1MTQ1NjA6eXptdmszQUNUam1XbVJjRjRkRk9UdzswOw==
 delete mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll,scroll=1m,scroll_id=cXVlcnlUaGVuRmV0Y2g7Mzs5NTU1MTQ1NTk6eXptdmszQUNUam1XbVJjRjRkRk9Udzs5NTQ5NjQ5NjI6ZHZIZWxCb3BUZi1Cb3NwRDB5NmRQUTs5NTU1MTQ1NjA6eXptdmszQUNUam1XbVJjRjRkRk9UdzswOw==_visit1
 create mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll_page1
 create mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll_page2
 create mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll_page3
 create mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll_page4
 delete mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1_distribution__search,fields=name,size=1000,scroll=1m
 create mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1_release__search
Changes applied before test
commit 729d9b64da81df1ef2d81034b96a16b16a8d9544
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Oct 10 15:55:54 2022 +0200

    cpan: Fix module version extraction for some edge cases
    
    CPAN API can return versions that are not of str type: either
    int or float.
    
    When version equals 0, it means that version failed to be parsed
    by CPAN so we try to extract it from release name in that case.
    
    Otherwise we ensure to convert the version to str type.
    
    Related to T2833

commit 5121157ce326d32411e32f9f984f9a1f6e8710ae
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Sep 27 16:34:38 2022 +0200

    cpan: Improve listing process by querying the metacpan release endpoint
    
    Instead of querying the metacpan distribution endpoint to list origins,
    prefer to use the release endpoint instead enabling to list all artifacts
    associated to CPAN packages by scrolling results.
    
    Compared to previous implementation, it enables to compute a last_update
    date for all CPAN packages but also to obtain artifact sha256 checksums
    that will be used by the CPAN loader to check downloads integrity.
    
    As the multiple versions of a module are spread across multiple pages
    from the CPAN API, origins are sent to the scheduler once all pages
    processed, it is also faster to proceed that way.
    
    Related to T2833

Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/780/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/780/console

Build is green

Patch application report for D8648 (id=31261)

Could not rebase; Attempt merge onto 108816f232...

Updating 108816f..a64077d
Fast-forward
 swh/lister/cpan/__init__.py                        |   8 +-
 swh/lister/cpan/lister.py                          | 159 ++++++++++---
 ...TU1MTQ1NjA6eXptdmszQUNUam1XbVJjRjRkRk9UdzswOw== |  50 -----
 ...NjA6eXptdmszQUNUam1XbVJjRjRkRk9UdzswOw==_visit1 |  16 --
 .../v1__search_scroll_page1                        | 247 +++++++++++++++++++++
 .../v1__search_scroll_page2                        |  39 ++++
 .../v1__search_scroll_page3                        |  85 +++++++
 .../v1__search_scroll_page4                        | 131 +++++++++++
 ...ibution__search,fields=name,size=1000,scroll=1m |  52 -----
 .../https_fastapi.metacpan.org/v1_release__search  | 246 ++++++++++++++++++++
 swh/lister/cpan/tests/test_lister.py               | 165 ++++++++++++--
 11 files changed, 1038 insertions(+), 160 deletions(-)
 delete mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll,scroll=1m,scroll_id=cXVlcnlUaGVuRmV0Y2g7Mzs5NTU1MTQ1NTk6eXptdmszQUNUam1XbVJjRjRkRk9Udzs5NTQ5NjQ5NjI6ZHZIZWxCb3BUZi1Cb3NwRDB5NmRQUTs5NTU1MTQ1NjA6eXptdmszQUNUam1XbVJjRjRkRk9UdzswOw==
 delete mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll,scroll=1m,scroll_id=cXVlcnlUaGVuRmV0Y2g7Mzs5NTU1MTQ1NTk6eXptdmszQUNUam1XbVJjRjRkRk9Udzs5NTQ5NjQ5NjI6ZHZIZWxCb3BUZi1Cb3NwRDB5NmRQUTs5NTU1MTQ1NjA6eXptdmszQUNUam1XbVJjRjRkRk9UdzswOw==_visit1
 create mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll_page1
 create mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll_page2
 create mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll_page3
 create mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll_page4
 delete mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1_distribution__search,fields=name,size=1000,scroll=1m
 create mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1_release__search
Changes applied before test
commit a64077d2251605f4aced9c2a35d649d8b7a56ef7
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Oct 10 15:55:54 2022 +0200

    cpan: Fix module version extraction for some edge cases
    
    CPAN API can return versions that are not of str type: either
    int or float.
    
    When version equals 0, it means that version failed to be parsed
    by CPAN so we try to extract it from release name in that case.
    
    Otherwise we ensure to convert the version to str type.
    
    Related to T2833

commit e09a31c4c0072ff93453215aa772a7cfcabec5f1
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Sep 27 16:34:38 2022 +0200

    cpan: Improve listing process by querying the metacpan release endpoint
    
    Instead of querying the metacpan distribution endpoint to list origins,
    prefer to use the release endpoint instead enabling to list all artifacts
    associated to CPAN packages by scrolling results.
    
    Compared to previous implementation, it enables to compute a last_update
    date for all CPAN packages but also to obtain artifact sha256 checksums
    that will be used by the CPAN loader to check downloads integrity.
    
    As the multiple versions of a module are spread across multiple pages
    from the CPAN API, origins are sent to the scheduler once all pages
    processed, it is also faster to proceed that way.
    
    Related to T2833

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/782/ for more details.

Build is green

Patch application report for D8648 (id=31263)

Could not rebase; Attempt merge onto 108816f232...

Updating 108816f..05cd1de
Fast-forward
 swh/lister/cpan/__init__.py                        |   8 +-
 swh/lister/cpan/lister.py                          | 158 ++++++++++---
 ...TU1MTQ1NjA6eXptdmszQUNUam1XbVJjRjRkRk9UdzswOw== |  50 -----
 ...NjA6eXptdmszQUNUam1XbVJjRjRkRk9UdzswOw==_visit1 |  16 --
 .../v1__search_scroll_page1                        | 247 +++++++++++++++++++++
 .../v1__search_scroll_page2                        |  39 ++++
 .../v1__search_scroll_page3                        |  85 +++++++
 .../v1__search_scroll_page4                        | 131 +++++++++++
 ...ibution__search,fields=name,size=1000,scroll=1m |  52 -----
 .../https_fastapi.metacpan.org/v1_release__search  | 246 ++++++++++++++++++++
 swh/lister/cpan/tests/test_lister.py               | 165 ++++++++++++--
 11 files changed, 1037 insertions(+), 160 deletions(-)
 delete mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll,scroll=1m,scroll_id=cXVlcnlUaGVuRmV0Y2g7Mzs5NTU1MTQ1NTk6eXptdmszQUNUam1XbVJjRjRkRk9Udzs5NTQ5NjQ5NjI6ZHZIZWxCb3BUZi1Cb3NwRDB5NmRQUTs5NTU1MTQ1NjA6eXptdmszQUNUam1XbVJjRjRkRk9UdzswOw==
 delete mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll,scroll=1m,scroll_id=cXVlcnlUaGVuRmV0Y2g7Mzs5NTU1MTQ1NTk6eXptdmszQUNUam1XbVJjRjRkRk9Udzs5NTQ5NjQ5NjI6ZHZIZWxCb3BUZi1Cb3NwRDB5NmRQUTs5NTU1MTQ1NjA6eXptdmszQUNUam1XbVJjRjRkRk9UdzswOw==_visit1
 create mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll_page1
 create mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll_page2
 create mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll_page3
 create mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1__search_scroll_page4
 delete mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1_distribution__search,fields=name,size=1000,scroll=1m
 create mode 100644 swh/lister/cpan/tests/data/https_fastapi.metacpan.org/v1_release__search
Changes applied before test
commit 05cd1de1cde7ed26ca46d970e4635ba142af9031
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Oct 10 15:55:54 2022 +0200

    cpan: Fix module version extraction for some edge cases
    
    CPAN API can return versions that are not of str type: either
    int or float.
    
    When version equals 0, it means that version failed to be parsed
    by CPAN so we try to extract it from release name in that case.
    
    Otherwise we ensure to convert the version to str type.
    
    Related to T2833

commit f57b8f3a2c49080ae9bc11217b8d6ef4ed8c564e
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Sep 27 16:34:38 2022 +0200

    cpan: Improve listing process by querying the metacpan release endpoint
    
    Instead of querying the metacpan distribution endpoint to list origins,
    prefer to use the release endpoint instead enabling to list all artifacts
    associated to CPAN packages by scrolling results.
    
    Compared to previous implementation, it enables to compute a last_update
    date for all CPAN packages but also to obtain artifact sha256 checksums
    that will be used by the CPAN loader to check downloads integrity.
    
    As the multiple versions of a module are spread across multiple pages
    from the CPAN API, origins are sent to the scheduler once all pages
    processed, it is also faster to proceed that way.
    
    Related to T2833

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/784/ for more details.