Page MenuHomeSoftware Heritage

nixguix: Randomize order of listed origins
ClosedPublic

Authored by ardumont on Oct 4 2022, 10:59 AM.

Details

Summary

The end goal is to ingest sparsely the origins, that would avoid hitting the various
servers around the same time for colocated origins in the upstream manifest (especially
file or tarball).

Related to T3781
Depends on D8341

Diff Detail

Repository
rDLS Listers
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D8605 (id=31084)

Could not rebase; Attempt merge onto fa1205c4df...

Updating fa1205c..41eef0a
Fast-forward
 requirements-swh.txt                               |   2 +-
 setup.py                                           |   1 +
 swh/lister/__init__.py                             |  22 ++
 swh/lister/gnu/tree.py                             |  21 +-
 swh/lister/nixguix/__init__.py                     |  38 +++
 swh/lister/nixguix/lister.py                       | 374 +++++++++++++++++++++
 swh/lister/nixguix/tasks.py                        |  14 +
 swh/lister/nixguix/tests/__init__.py               |   0
 .../nixguix/tests/data/guix-swh_sources.json       |  19 ++
 .../nixguix/tests/data/nixpkgs-swh_sources.json    |  52 +++
 swh/lister/nixguix/tests/test_lister.py            | 244 ++++++++++++++
 swh/lister/nixguix/tests/test_tasks.py             |  27 ++
 swh/lister/tests/test_cli.py                       |   4 +
 13 files changed, 800 insertions(+), 18 deletions(-)
 create mode 100644 swh/lister/nixguix/__init__.py
 create mode 100644 swh/lister/nixguix/lister.py
 create mode 100644 swh/lister/nixguix/tasks.py
 create mode 100644 swh/lister/nixguix/tests/__init__.py
 create mode 100644 swh/lister/nixguix/tests/data/guix-swh_sources.json
 create mode 100644 swh/lister/nixguix/tests/data/nixpkgs-swh_sources.json
 create mode 100644 swh/lister/nixguix/tests/test_lister.py
 create mode 100644 swh/lister/nixguix/tests/test_tasks.py
Changes applied before test
commit 41eef0a80a64d896c8c556d39b7730c8d53b5669
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Oct 4 10:57:32 2022 +0200

    nixguix: Randomize listed origins to ingest sparsely origins
    
    Especially the first time around, that would avoid hitting the various servers around
    the same time for grouped origins (especially file or tarball).
    
    Related to T3781

commit 94b6dbea0a7f602be0711a3bb1f9bb9e16fc48ce
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Oct 1 16:41:48 2022 +0200

    nixguix: Document lister
    
    Related to T3781

commit 6d2e7aa17808e39ba9f493b65d662d0ddef5796c
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Oct 1 16:12:46 2022 +0200

    nixguix: Register task
    
    Related to T3781

commit fbfdf88ea4fe79c4846ecd48f2a1322f5d3995fc
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Aug 30 11:17:33 2022 +0200

    nixguix: Add lister
    
    Related to T3781

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/734/ for more details.

swh/lister/nixguix/lister.py
375

I did not push it in lister.pattern directly but i wondered...

lol, nothing...

I forgot the main gist of it

batch_origins.shuffle()
yield batch_origins

and you don't need grouper() for that

by the way, you should reword the commit/diff title to mention the order is randomized, not origins themselves

ardumont retitled this revision from nixguix: Randomize listed origins to ingest sparsely origins to nixguix: Randomize order of listed origins.Oct 4 2022, 11:30 AM
ardumont edited the summary of this revision. (Show Details)

Actually randomize and do it at the manifest reading step.

and you don't need grouper() for that

Totally. Amended.

by the way, you should reword the commit/diff title to mention the order is randomized, not origins themselves

Indeed. Fixed.

Thanks.

Build is green

Patch application report for D8605 (id=31086)

Could not rebase; Attempt merge onto fa1205c4df...

Updating fa1205c..cdeb7f5
Fast-forward
 requirements-swh.txt                               |   2 +-
 setup.py                                           |   1 +
 swh/lister/__init__.py                             |  22 ++
 swh/lister/gnu/tree.py                             |  21 +-
 swh/lister/nixguix/__init__.py                     |  38 +++
 swh/lister/nixguix/lister.py                       | 374 +++++++++++++++++++++
 swh/lister/nixguix/tasks.py                        |  14 +
 swh/lister/nixguix/tests/__init__.py               |   0
 .../nixguix/tests/data/guix-swh_sources.json       |  19 ++
 .../nixguix/tests/data/nixpkgs-swh_sources.json    |  52 +++
 swh/lister/nixguix/tests/test_lister.py            | 244 ++++++++++++++
 swh/lister/nixguix/tests/test_tasks.py             |  27 ++
 swh/lister/tests/test_cli.py                       |   4 +
 13 files changed, 800 insertions(+), 18 deletions(-)
 create mode 100644 swh/lister/nixguix/__init__.py
 create mode 100644 swh/lister/nixguix/lister.py
 create mode 100644 swh/lister/nixguix/tasks.py
 create mode 100644 swh/lister/nixguix/tests/__init__.py
 create mode 100644 swh/lister/nixguix/tests/data/guix-swh_sources.json
 create mode 100644 swh/lister/nixguix/tests/data/nixpkgs-swh_sources.json
 create mode 100644 swh/lister/nixguix/tests/test_lister.py
 create mode 100644 swh/lister/nixguix/tests/test_tasks.py
Changes applied before test
commit cdeb7f59ed72ebda661cab55a14556ece8905d64
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Oct 4 10:57:32 2022 +0200

    nixguix: Randomize order of listed origins
    
    The end goal is to ingest sparsely the origins, that would avoid hitting the various
    servers around the same time for colocated origins in the upstream manifest (especially
    file or tarball).
    
    Related to T3781

commit 94b6dbea0a7f602be0711a3bb1f9bb9e16fc48ce
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Oct 1 16:41:48 2022 +0200

    nixguix: Document lister
    
    Related to T3781

commit 6d2e7aa17808e39ba9f493b65d662d0ddef5796c
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Oct 1 16:12:46 2022 +0200

    nixguix: Register task
    
    Related to T3781

commit fbfdf88ea4fe79c4846ecd48f2a1322f5d3995fc
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Aug 30 11:17:33 2022 +0200

    nixguix: Add lister
    
    Related to T3781

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/735/ for more details.

vlorentz added inline comments.
swh/lister/nixguix/lister.py
267

More readable, IMO.

(and remove the import)

This revision is now accepted and ready to land.Oct 4 2022, 11:47 AM

Adapt according to review.
That makes sense plus i forgot i already had that import so fine!

Thanks again

Build is green

Patch application report for D8605 (id=31087)

Could not rebase; Attempt merge onto fa1205c4df...

Updating fa1205c..1b4fe51
Fast-forward
 requirements-swh.txt                               |   2 +-
 setup.py                                           |   1 +
 swh/lister/__init__.py                             |  22 ++
 swh/lister/gnu/tree.py                             |  21 +-
 swh/lister/nixguix/__init__.py                     |  38 +++
 swh/lister/nixguix/lister.py                       | 373 +++++++++++++++++++++
 swh/lister/nixguix/tasks.py                        |  14 +
 swh/lister/nixguix/tests/__init__.py               |   0
 .../nixguix/tests/data/guix-swh_sources.json       |  19 ++
 .../nixguix/tests/data/nixpkgs-swh_sources.json    |  52 +++
 swh/lister/nixguix/tests/test_lister.py            | 244 ++++++++++++++
 swh/lister/nixguix/tests/test_tasks.py             |  27 ++
 swh/lister/tests/test_cli.py                       |   4 +
 13 files changed, 799 insertions(+), 18 deletions(-)
 create mode 100644 swh/lister/nixguix/__init__.py
 create mode 100644 swh/lister/nixguix/lister.py
 create mode 100644 swh/lister/nixguix/tasks.py
 create mode 100644 swh/lister/nixguix/tests/__init__.py
 create mode 100644 swh/lister/nixguix/tests/data/guix-swh_sources.json
 create mode 100644 swh/lister/nixguix/tests/data/nixpkgs-swh_sources.json
 create mode 100644 swh/lister/nixguix/tests/test_lister.py
 create mode 100644 swh/lister/nixguix/tests/test_tasks.py
Changes applied before test
commit 1b4fe51f62c706a9ef77b8eea74e111bb8be3542
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Oct 4 10:57:32 2022 +0200

    nixguix: Randomize order of listed origins
    
    The end goal is to ingest sparsely the origins, that would avoid hitting the various
    servers around the same time for colocated origins in the upstream manifest (especially
    file or tarball).
    
    Related to T3781

commit 94b6dbea0a7f602be0711a3bb1f9bb9e16fc48ce
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Oct 1 16:41:48 2022 +0200

    nixguix: Document lister
    
    Related to T3781

commit 6d2e7aa17808e39ba9f493b65d662d0ddef5796c
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Oct 1 16:12:46 2022 +0200

    nixguix: Register task
    
    Related to T3781

commit fbfdf88ea4fe79c4846ecd48f2a1322f5d3995fc
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Aug 30 11:17:33 2022 +0200

    nixguix: Add lister
    
    Related to T3781

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/736/ for more details.