Page MenuHomeSoftware Heritage

Add built-in page and origin count limit to listers
ClosedPublic

Authored by olasd on Dec 5 2022, 2:21 PM.

Details

Summary

This will allow more automation of the staging add forge now process:
for known-good listers, we can limit the number of origins being
processed and reduce the amount of manual steps taken for each instance.

Test Plan

new tests added

Diff Detail

Repository
rDLS Listers
Branch
detached-head
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 33079
Build 51852: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 51851: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D8915 (id=32123)

Could not rebase; Attempt merge onto f4aafe026b...

Updating f4aafe0..851b8e1
Fast-forward
 swh/lister/pattern.py            | 22 +++++++++++++++++++-
 swh/lister/rubygems/lister.py    | 18 ++++++++++++++++-
 swh/lister/tests/test_pattern.py | 43 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 81 insertions(+), 2 deletions(-)
Changes applied before test
commit 851b8e180e3841eebc5f1440b92add24f816d051
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Mon Dec 5 14:20:19 2022 +0100

    Add built-in page and origin count limit to listers
    
    This will allow more automation of the staging add forge now process:
    for known-good listers, we can limit the number of origins being
    processed and reduce the amount of manual steps taken for each instance.

commit a66e24bfa2af7de8cacf4c4525f0eb5103aae0eb
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Mon Dec 5 13:43:32 2022 +0100

    Ignore psqlrc when loading the rubygems database dump
    
    The SQL dump contains ownership instructions that can't be run if you
    don't have the right users in your database clusters. When someone has a
    psqlrc with ON_ERROR_STOP, this fails the load of the dump.
    
    Use the opportunity to trigger an exception when psql returns a non-zero
    exit code, rather than continue with an empty/inconsistent database.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/869/ for more details.

olasd requested review of this revision.Dec 5 2022, 2:27 PM
ardumont added inline comments.
swh/lister/tests/test_pattern.py
259

I'm surprised here, i would have expected 10 (as in the nb of pages the lister provides) * max_origins_per_pages, what did i miss?

lgtm

just something i don't get in the test, as per my question inline.

This revision is now accepted and ready to land.Dec 5 2022, 2:31 PM
swh/lister/tests/test_pattern.py
259

The number of origins per page is clamped to, well, the number of origins per page that the unlimited lister returns. So a limit of 100 origins per page will still yield 10 origins per page.

(this reminds me that we should test the default value too)

Add explicit test for default behavior

Build is green

Patch application report for D8915 (id=32126)

Could not rebase; Attempt merge onto f4aafe026b...

Updating f4aafe0..b815737
Fast-forward
 swh/lister/pattern.py            | 22 ++++++++++++-
 swh/lister/rubygems/lister.py    | 18 ++++++++++-
 swh/lister/tests/test_pattern.py | 67 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 105 insertions(+), 2 deletions(-)
Changes applied before test
commit b815737054acdbebf0d71036af3b713e866beed6
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Mon Dec 5 14:20:19 2022 +0100

    Add built-in page and origin count limit to listers
    
    This will allow more automation of the staging add forge now process:
    for known-good listers, we can limit the number of origins being
    processed and reduce the amount of manual steps taken for each instance.

commit a66e24bfa2af7de8cacf4c4525f0eb5103aae0eb
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Mon Dec 5 13:43:32 2022 +0100

    Ignore psqlrc when loading the rubygems database dump
    
    The SQL dump contains ownership instructions that can't be run if you
    don't have the right users in your database clusters. When someone has a
    psqlrc with ON_ERROR_STOP, this fails the load of the dump.
    
    Use the opportunity to trigger an exception when psql returns a non-zero
    exit code, rather than continue with an empty/inconsistent database.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/871/ for more details.