Page MenuHomeSoftware Heritage

pypi/loader: Filter out sdist archives not of interest
ClosedPublic

Authored by anlambert on Sep 15 2021, 3:39 PM.

Details

Summary

Some PyPI origins declare sdist archives that cannot be extracted
by swh.core.tarball.uncompress and their content do not match
standard sdist layout.

This is notably the case for sdist files whose extensions are
.deb, .egg, .rpm or .whl.

As those artifacts are not of interest to archive and generate
errors while loading PyPI origins, filter them out from the
sdist files to process.

Related to T3575

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D6270 (id=22700)

Rebasing onto d5e54a5eea...

Current branch diff-target is up to date.
Changes applied before test
commit bb0116c30e9c998d24f66760df6ff295223c65ae
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Sep 14 17:39:03 2021 +0200

    pypi/loader: Filter out sdist archives not of interest
    
    Some PyPI origins declare sdist archives that cannot be extracted
    by swh.core.tarball.uncompress and their content do not match
    standard sdist layout.
    
    This is notably the case for sdist files whose extensions are
    .deb, .egg, .rpm or .whl.
    
    As those artifacts are not of interest to archive and generate
    errors while loading PyPI origins, filter them out from the
    sdist files to process.
    
    Related to T3575

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/554/ for more details.

Build is green

Patch application report for D6270 (id=22705)

Rebasing onto d5e54a5eea...

Current branch diff-target is up to date.
Changes applied before test
commit d667a10217085bc7630a21f895b52aa8e758c6fe
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Sep 14 17:39:03 2021 +0200

    pypi/loader: Filter out sdist archives not of interest
    
    Some PyPI origins declare sdist archives that cannot be extracted
    by swh.core.tarball.uncompress and their content do not match
    standard sdist layout.
    
    This is notably the case for sdist files whose extensions are
    .deb, .egg, .rpm or .whl.
    
    As those artifacts are not of interest to archive and generate
    errors while loading PyPI origins, filter them out from the
    sdist files to process.
    
    Related to T3575

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/555/ for more details.

Remove no longer needed call to any

Build is green

Patch application report for D6270 (id=22707)

Rebasing onto d5e54a5eea...

Current branch diff-target is up to date.
Changes applied before test
commit 732999842159cc9f6efb01cf81e54afd8cf7be6e
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Sep 14 17:39:03 2021 +0200

    pypi/loader: Filter out sdist archives not of interest
    
    Some PyPI origins declare sdist archives that cannot be extracted
    by swh.core.tarball.uncompress and their content do not match
    standard sdist layout.
    
    This is notably the case for sdist files whose extensions are
    .deb, .egg, .rpm or .whl.
    
    As those artifacts are not of interest to archive and generate
    errors while loading PyPI origins, filter them out from the
    sdist files to process.
    
    Related to T3575

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/556/ for more details.

olasd added a subscriber: olasd.

Thanks

This revision is now accepted and ready to land.Sep 16 2021, 9:46 AM