Page MenuHomeSoftware Heritage

nixguix: Allow lister to ignore specific extensions
ClosedPublic

Authored by ardumont on Oct 25 2022, 10:41 AM.

Details

Summary

Those extensions can be extended through configuration. They default to some binary
format already encountered during docker runs.

Related to T3781
Depends on D8761

Diff Detail

Repository
rDLS Listers
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D8763 (id=31591)

Could not rebase; Attempt merge onto 31eb5f637f...

Updating 31eb5f6..eea39c1
Fast-forward
 swh/lister/nixguix/lister.py                       |  36 ++++++-
 swh/lister/nixguix/tests/data/sources-failure.json | 110 +++++++++++++++++++++
 swh/lister/nixguix/tests/data/sources-success.json |  94 +++++++++++++++++-
 swh/lister/nixguix/tests/test_lister.py            |  30 +++++-
 4 files changed, 264 insertions(+), 6 deletions(-)
Changes applied before test
commit eea39c1877b754a1653a786ad91b723c99ad4e2b
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Oct 25 10:39:01 2022 +0200

    nixguix: Allow lister configuration to ignore specific extensions
    
    Related to T3781

commit 08c87b67f04585d5036ff85e1886fbbb518d9ec4
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Oct 25 10:07:43 2022 +0200

    nixguix/test: Add all supported tarball extensions to test manifest
    
    Next step is to add some extensions filtering so might as well harden the test dataset
    first.
    
    Related to T3781

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/822/ for more details.

anlambert added a subscriber: anlambert.

LGTM, one thought posted as inline comment.

swh/lister/nixguix/tests/test_lister.py
291–298

I think those could be set as default in the lister constructor as we are not interested in archiving binary files. You could then extend that default list through configuration if we encounter other extensions not related to source code archives.

This revision is now accepted and ready to land.Oct 25 2022, 10:58 AM
swh/lister/nixguix/tests/test_lister.py
291–298

Yes, it was on my mind initially but was unsure whether we wanted to hard-code it immediatly.
I'll adapt accordingly.

Note that I've also locally refactored a bit the filtering to reuse some code.
Will push the update soon.

As usual, thx.

ardumont retitled this revision from nixguix: Allow lister configuration to ignore specific extensions to nixguix: Allow lister to ignore specific extensions.Oct 25 2022, 11:37 AM
ardumont edited the summary of this revision. (Show Details)
  • Rebase
  • Refactor common code on extension detection
  • Adapt docstring and commit message
  • Default extensions to ignore
  • Adapt constructor to extend extension to ignore if any is passed around
ardumont added inline comments.
swh/lister/nixguix/tests/test_lister.py
291–298

done

Rename variable parameter with a better name

Build was aborted

Patch application report for D8763 (id=31595)

Could not rebase; Attempt merge onto 4f6b3f3f09...

Updating 4f6b3f3..8c5c1b7
Fast-forward
 swh/lister/nixguix/lister.py                       |  65 ++++++++++--
 swh/lister/nixguix/tests/data/sources-failure.json | 117 +++++++++++++++++++++
 swh/lister/nixguix/tests/data/sources-success.json |  94 ++++++++++++++++-
 swh/lister/nixguix/tests/test_lister.py            |  32 +++++-
 4 files changed, 295 insertions(+), 13 deletions(-)
Changes applied before test
commit 8c5c1b7ab8e282935073eb782546d60b8c89c76b
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Oct 25 10:39:01 2022 +0200

    nixguix: Allow lister to ignore specific extensions
    
    Those extensions can be extended through configuration. They default to some binary
    format already encountered during docker runs.
    
    Related to T3781

commit d96a39d5b0569ae82bc15ccb09e25aa16e53e72c
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Oct 25 10:07:43 2022 +0200

    nixguix/test: Add all supported tarball extensions to test manifest
    
    Next step is to add some extensions filtering so might as well harden the test dataset
    first.
    
    Related to T3781

Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/825/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/825/console

Build has FAILED

Patch application report for D8763 (id=31596)

Could not rebase; Attempt merge onto 4f6b3f3f09...

Updating 4f6b3f3..eddf45b
Fast-forward
 swh/lister/nixguix/lister.py                       |  69 ++++++++++--
 swh/lister/nixguix/tests/data/sources-failure.json | 117 +++++++++++++++++++++
 swh/lister/nixguix/tests/data/sources-success.json |  94 ++++++++++++++++-
 swh/lister/nixguix/tests/test_lister.py            |  32 +++++-
 4 files changed, 299 insertions(+), 13 deletions(-)
Changes applied before test
commit eddf45b619b05ef299ee4cc3864401e89215f957
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Oct 25 10:39:01 2022 +0200

    nixguix: Allow lister to ignore specific extensions
    
    Those extensions can be extended through configuration. They default to some binary
    format already encountered during docker runs.
    
    Related to T3781

commit d96a39d5b0569ae82bc15ccb09e25aa16e53e72c
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Oct 25 10:07:43 2022 +0200

    nixguix/test: Add all supported tarball extensions to test manifest
    
    Next step is to add some extensions filtering so might as well harden the test dataset
    first.
    
    Related to T3781

Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/826/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/826/console

  • Add missing test
  • Fix another test

Build is green

Patch application report for D8763 (id=31598)

Could not rebase; Attempt merge onto 4f6b3f3f09...

Updating 4f6b3f3..77c2b87
Fast-forward
 swh/lister/nixguix/lister.py                       |  75 ++++++++++---
 swh/lister/nixguix/tests/data/sources-failure.json | 117 +++++++++++++++++++++
 swh/lister/nixguix/tests/data/sources-success.json |  94 ++++++++++++++++-
 swh/lister/nixguix/tests/test_lister.py            |  48 ++++++++-
 4 files changed, 316 insertions(+), 18 deletions(-)
Changes applied before test
commit 77c2b875dc9a3290fb345e7658e4d57ca38dece3
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Oct 25 10:39:01 2022 +0200

    nixguix: Allow lister to ignore specific extensions
    
    Those extensions can be extended through configuration. They default to some binary
    format already encountered during docker runs.
    
    Related to T3781

commit d96a39d5b0569ae82bc15ccb09e25aa16e53e72c
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Oct 25 10:07:43 2022 +0200

    nixguix/test: Add all supported tarball extensions to test manifest
    
    Next step is to add some extensions filtering so might as well harden the test dataset
    first.
    
    Related to T3781

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/827/ for more details.

swh/lister/nixguix/lister.py
39

you could merge your comment into a single one, something like: By default, ignore binary files and archives containing binaries

300

why a copy here ?

ardumont added inline comments.
swh/lister/nixguix/lister.py
300

¯\_(ツ)_/¯

ardumont marked an inline comment as done.

Adapt according to last review (merge variable comment, drop spurious .copy)

Build is green

Patch application report for D8763 (id=31600)

Could not rebase; Attempt merge onto 4f6b3f3f09...

Updating 4f6b3f3..ca4ab7f
Fast-forward
 swh/lister/nixguix/lister.py                       |  74 ++++++++++---
 swh/lister/nixguix/tests/data/sources-failure.json | 117 +++++++++++++++++++++
 swh/lister/nixguix/tests/data/sources-success.json |  94 ++++++++++++++++-
 swh/lister/nixguix/tests/test_lister.py            |  48 ++++++++-
 4 files changed, 315 insertions(+), 18 deletions(-)
Changes applied before test
commit ca4ab7f277dc51efc62dbb5c6866dc424d117d6a
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Oct 25 10:39:01 2022 +0200

    nixguix: Allow lister to ignore specific extensions
    
    Those extensions can be extended through configuration. They default to some binary
    format already encountered during docker runs.
    
    Related to T3781

commit d96a39d5b0569ae82bc15ccb09e25aa16e53e72c
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Oct 25 10:07:43 2022 +0200

    nixguix/test: Add all supported tarball extensions to test manifest
    
    Next step is to add some extensions filtering so might as well harden the test dataset
    first.
    
    Related to T3781

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/828/ for more details.