Page MenuHomeSoftware Heritage

conda: Yield listed origins after all artifacts in a page are processed
ClosedPublic

Authored by anlambert on Oct 19 2022, 4:08 PM.

Details

Summary

swh-scheduler will deduplicate listed origins according to their URL
and visit type but not according to their extra loader arguments.

Previously, listed origins were yielded after each processed artifact
in a page so we could lose some package version info due to the
deduplication process.

So ensure to yield listed origins once all artifacts in a page have
been processed.

Diff Detail

Repository
rDLS Listers
Branch
conda-yield-origins-after-page-processing
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 32522
Build 50934: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 50933: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D8747 (id=31527)

Rebasing onto 0baaf68cff...

Current branch diff-target is up to date.
Changes applied before test
commit f001bc1bc1695af7d21cd8fad33d00ded51ccfcd
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Oct 19 15:59:25 2022 +0200

    conda: Yield listed origins after all artifacts in a page processed
    
    swh-scheduler will deduplicate listed origins according to their URL
    and visit type but not according to their extra loader arguments.
    
    Previously, listed origins were yielded after each processed artifact
    in a page so we could lose some package version info due to the
    deduplication process.
    
    So ensure to yield listed origins once all artifacts in a page have
    been processed.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/803/ for more details.

I believe the commit title lacks a verb, doesn't it? ("conda: Yield listed origins after all artifacts in a page are processed" or something similar?)

shouldn't this fix come with a test of some sort?

anlambert retitled this revision from conda: Yield listed origins after all artifacts in a page processed to conda: Yield listed origins after all artifacts in a page are processed.Oct 20 2022, 3:39 PM

Build is green

Patch application report for D8747 (id=31531)

Rebasing onto 0baaf68cff...

Current branch diff-target is up to date.
Changes applied before test
commit fc22f7539a7fda5294ae8135d7c08d86b4ad0870
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Oct 19 15:59:25 2022 +0200

    conda: Yield listed origins after all artifacts in a page are processed
    
    swh-scheduler will deduplicate listed origins according to their URL
    and visit type but not according to their extra loader arguments.
    
    Previously, listed origins were yielded after each processed artifact
    in a page so we could lose some package version info due to the
    deduplication process.
    
    So ensure to yield listed origins once all artifacts in a page have
    been processed.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/805/ for more details.

vlorentz added a subscriber: vlorentz.

Could you add docstrings to the test? I don't understand what they are expected to test

swh/lister/conda/lister.py
119
120–121
swh/lister/conda/tests/test_lister.py
115

Please compare the lists contents; the length alone is not very robust, and doesn't give meaningful messages on failure

This revision now requires changes to proceed.Oct 24 2022, 10:40 AM
swh/lister/conda/lister.py
119

Nide, I did not know that keyword argument existed for max.

swh/lister/conda/tests/test_lister.py
115

Those are not the same type in the lists so I cannot compare them and the purpose of that test is to check the number of sent origins to the scheduler is not greater that the number of expected origins. The checks of the data sent to scheduler are done in previous tests.

swh/lister/conda/tests/test_lister.py
115

I will modify the test to check for origin URLs instead.

Build is green

Patch application report for D8747 (id=31568)

Rebasing onto 8a82bbf95f...

Current branch diff-target is up to date.
Changes applied before test
commit 018fc641bfb2c578be762f1867c33101687bfc03
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Oct 19 15:59:25 2022 +0200

    conda: Yield listed origins after all artifacts in a page are processed
    
    swh-scheduler will deduplicate listed origins according to their URL
    and visit type but not according to their extra loader arguments.
    
    Previously, listed origins were yielded after each processed artifact
    in a page so we could lose some package version info due to the
    deduplication process.
    
    So ensure to yield listed origins once all artifacts in a page have
    been processed.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/811/ for more details.

This revision is now accepted and ready to land.Oct 24 2022, 6:26 PM

Build is green

Patch application report for D8747 (id=31592)

Rebasing onto 31eb5f637f...

Current branch diff-target is up to date.
Changes applied before test
commit 4f6b3f3f09b8a8ab8a5e3679b3a8b936dbd21640
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Oct 19 15:59:25 2022 +0200

    conda: Yield listed origins after all artifacts in a page are processed
    
    swh-scheduler will deduplicate listed origins according to their URL
    and visit type but not according to their extra loader arguments.
    
    Previously, listed origins were yielded after each processed artifact
    in a page so we could lose some package version info due to the
    deduplication process.
    
    So ensure to yield listed origins once all artifacts in a page have
    been processed.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/823/ for more details.