Page MenuHomeSoftware Heritage

nixguix: Improve is_tarball detection pattern
ClosedPublic

Authored by ardumont on Oct 5 2022, 11:53 AM.

Details

Summary

This actually includes all query param values as paths to check. When paths have
extensions, it then pattern matches against tarballs if any. When no extension is
detected, it's doing as before, fallbacks to head query the url to have more information
on the file.

Prior to this commit, this only looked over a hard-coded list of values (for hard-coded
keys: file, f, name, url) detected through docker runs. This way of doing it should
decrease future misdetections (when new unknown "keys" show up in the wild).

Related to T3781

Diff Detail

Repository
rDLS Listers
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ardumont edited the summary of this revision. (Show Details)

Build is green

Patch application report for D8626 (id=31138)

Rebasing onto 2ee103e2bc...

Current branch diff-target is up to date.
Changes applied before test
commit 202a571dee7c14fba37729104e707e85f2d25cef
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Oct 5 11:52:43 2022 +0200

    nixguix: Refactor is_tarball to simplify
    
    Related to T3781

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/756/ for more details.

ardumont retitled this revision from nixguix: Refactor is_tarball to simplify to nixguix: Improve is_tarball detection pattern.Oct 5 2022, 12:01 PM
ardumont edited the summary of this revision. (Show Details)

really amend commit message ¯\_(ツ)_/¯

Build is green

Patch application report for D8626 (id=31142)

Rebasing onto 2ee103e2bc...

Current branch diff-target is up to date.
Changes applied before test
commit 3da112ce060a2de9178cf62b79eba4558c928f0e
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Oct 5 11:52:43 2022 +0200

    nixguix: Improve is_tarball detection pattern
    
    This actually includes all query param values as paths to check. It then checks for file
    pattern matching against "tarball" patterns. When no extension is detected, it's doing
    as before, fallbacks to head query the url to have more information on the file.
    
    Prior to this commit, this only looked over a hard-coded list of keys (file, f, name,
    url) detected through docker runs. This way of doing it should decrease future
    misdetections.
    
    Related to T3781

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/757/ for more details.

Build is green

Patch application report for D8626 (id=31143)

Rebasing onto 2ee103e2bc...

Current branch diff-target is up to date.
Changes applied before test
commit f2377c283ac542a5b492a9d75ccce6d86b07c54a
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Oct 5 11:52:43 2022 +0200

    nixguix: Improve is_tarball detection pattern
    
    This actually includes all query param values as paths to check. When paths have
    extensions, it then pattern matches against tarballs if any. When no extension is
    detected, it's doing as before, fallbacks to head query the url to have more information
    on the file.
    
    Prior to this commit, this only looked over a hard-coded list of values (for hard-coded
    keys: file, f, name, url) detected through docker runs. This way of doing it should
    decrease future misdetections (when new unknown "keys" show up in the wild).
    
    Related to T3781

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/758/ for more details.

This revision is now accepted and ready to land.Oct 5 2022, 12:10 PM