Page MenuHomeSoftware Heritage

nixguix: Improve tarball detection
ClosedPublic

Authored by ardumont on Oct 4 2022, 8:44 PM.

Details

Summary

Current runs demos that most content files are still tarballs [1]

Without this, some tarballs hidden within query parameters are not detected. This does
some extra effort to detect those to avoid sending noise to loaders.

Related to T3781

[1] Related to P1475

Diff Detail

Repository
rDLS Listers
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D8619 (id=31121)

Rebasing onto 944d4b5b60...

Current branch diff-target is up to date.
Changes applied before test
commit ab2cf62a1008b2dd73c647f746d6e81e3ea93631
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Oct 4 20:42:20 2022 +0200

    nixguix: Better detection of urls targeting tarballs vs files
    
    Related to T3781

Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/750/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/750/console

Harbormaster returned this revision to the author for changes because remote builds failed.Oct 4 2022, 8:50 PM
Harbormaster failed remote builds in B32090: Diff 31121!
ardumont retitled this revision from nixguix: Better detection of urls targeting tarballs vs files to nixguix: Improve tarball detection.Oct 4 2022, 9:01 PM
ardumont edited the summary of this revision. (Show Details)

Build is green

Patch application report for D8619 (id=31122)

Rebasing onto 944d4b5b60...

Current branch diff-target is up to date.
Changes applied before test
commit c4c6df39e142648e066887f0ce6289974c99417d
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Oct 4 20:42:20 2022 +0200

    nixguix: Improve tarball detection
    
    Without this, some tarballs hidden within query parameters are not detected. This does
    some extra effort to detect those to avoid sending noise to loaders.
    
    Related to T3781

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/751/ for more details.

create a new exception class instead of IndexError, it's unclear that the rest of the function is not actually meant to raise that exception.

This revision now requires changes to proceed.Oct 4 2022, 11:15 PM

create a new exception class instead of IndexError, it's unclear that the rest of the function is not actually meant to raise that exception.

I can tell you it's not supposed to raise outside of this function.
Otherwise, i would have documented it in the main docstring of the function.

Adapt according to suggestion

Build is green

Patch application report for D8619 (id=31126)

Rebasing onto 944d4b5b60...

Current branch diff-target is up to date.
Changes applied before test
commit 2fbd66778f3286c77149796339426aeccd11c1d0
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Oct 4 20:42:20 2022 +0200

    nixguix: Improve tarball detection
    
    Without this, some tarballs hidden within query parameters are not detected. This does
    some extra effort to detect those to avoid sending noise to loaders.
    
    Related to T3781

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/753/ for more details.

create a new exception class instead of IndexError, it's unclear that the rest of the function is not actually meant to raise that exception.

I can tell you it's not supposed to raise outside of this function.
Otherwise, i would have documented it in the main docstring of the function.

Adapted as requested nonetheless.

anlambert added inline comments.
swh/lister/nixguix/lister.py
161–174

You should check values of all query parameters I think as you cannot guess their names.

You can also make that code shorter the following way:

return any(path.endswith(tuple(TARBALL_EXTENSIONS) for path in (urlparsed.path,) + tuple(parse_qsl(urlparsed.query).keys())))
swh/lister/nixguix/lister.py
161–174

ok, looks like a better approach.

I'm gonna land as is for now.

And do another diff to simplify all this (i have another diff already stacked on this and another incoming diff and I'd like to avoid having to spend my time rebasing stuff ;)

This revision is now accepted and ready to land.Oct 5 2022, 11:07 AM
This revision was automatically updated to reflect the committed changes.