Page MenuHomeSoftware Heritage

npm, pypi: Split original_artifact parsing out of artifact_to_revision_id
ClosedPublic

Authored by vlorentz on Mar 19 2021, 2:21 PM.

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D5287 (id=18934)

Could not rebase; Attempt merge onto 132522e42e...

Merge made by the 'recursive' strategy.
 swh/loader/package/debian/loader.py              | 53 +++++++++++++----------
 swh/loader/package/debian/tests/test_debian.py   | 55 ++++++++++++++++++++++++
 swh/loader/package/nixguix/tests/test_nixguix.py | 33 ++++++++------
 swh/loader/package/npm/loader.py                 | 26 ++++++-----
 swh/loader/package/pypi/loader.py                | 31 +++++++------
 5 files changed, 140 insertions(+), 58 deletions(-)
Changes applied before test
commit b7267fa75d20e392e176033928d56a6d89f92aef
Merge: 132522e 9dc175c
Author: Jenkins user <jenkins@localhost>
Date:   Fri Mar 19 13:25:08 2021 +0000

    Merge branch 'diff-target' into HEAD

commit 9dc175c2aef95b0e5a2a83a71ec63e2c1f075c7b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Mar 17 09:32:37 2021 +0100

    npm, pypi: Split original_artifact parsing out of artifact_to_revision_id
    
    We will need it independently in a future commit

commit 93c9aa8e014c19b02887e6c05a671ad7430235dd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Mar 18 16:18:40 2021 +0100

    debian: Split original_artifact parsing out of resolve_revision_from
    
    We will need it independently in a future commit

commit 9a4991b391e6b825f9dee3008a410650d434ca6d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Mar 17 10:02:21 2021 +0100

    debian: Make resolve_revision_from use the sha256 of the .dsc
    
    Instead of the sha256 + name + ... of all the files of a package.
    
    This will be needed to transition to ExtID, as we can't reasonably write
    this large set in the ExtID storage; and the sha256 of the .dsc is good
    enough, as the .dsc contains hashes and names of other files.

commit 455e05837dc2fb59b4adcf01df27c6f7d64781dd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Mar 16 14:44:33 2021 +0100

    test_load_nixguix_one_common_artifact_from_other_loader: Test all log lines, not just the last one
    
    It's stricter and more readable.

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/405/ for more details.

ardumont added inline comments.
swh/loader/package/pypi/loader.py
207

Why did the filtering disappeared?

I see you are aligning with the npm implementation and the pypi one (slightly differently).

Are you sure this this is the way to go?
Why not the other around, having the filtering in both implementation instead?

swh/loader/package/pypi/loader.py
207

It did not disappear. The filtering is done by the caller instead.

swh/loader/package/pypi/loader.py
207

In the prior implementation for that loop (from line 208 of this version), prior to this diff, we iterated over all values.
Now you are returning the first element of that loop.
You never filter on the remaining elements...

swh/loader/package/pypi/loader.py
207

oh, right; the issue isn't filtering, but that I'm only returning the first element.

swh/loader/package/pypi/loader.py
207

yes.

And I'm merely pointing this "discrepancy" out.

request changes for the sake of the remark you agreed upon and not let this rot too
much.

This revision is now accepted and ready to land.Mar 23 2021, 9:16 AM
This revision now requires changes to proceed.Mar 23 2021, 9:17 AM

I checked, and there isn't any revision with more than one known artifact:

select origin.url, revision.metadata -> 'original_artifact' from revision inner join snapshot_branch on (revision.id=snapshot_branch.target) inner join snapshot_branches on (snapshot_branch.object_id=snapshot_branches.branch_id) inner join snapshot on (snapshot_branches.snapshot_id=snapshot.object_id) inner join origin_visit_status on (origin_visit_status.snapshot=snapshot.id) inner join origin on (origin.id=origin_visit_status.origin) where origin.url > 'https://pypi.org' and origin.url < 'https://pypi.orz' and revision.metadata -> 'original_artifact' -> 1 is not null

I checked, and there isn't any revision with more than one known artifact:

Thanks.

Ok then.

This revision is now accepted and ready to land.Mar 23 2021, 9:27 AM

no wait, it doesn't mean we should keep that for loop!

oops, bad comment. I added a check that the length is 1, and explicitly take the first argument, so please check it's still fine with you

Build has FAILED

Patch application report for D5287 (id=18998)

Could not rebase; Attempt merge onto 132522e42e...

Updating 132522e..6da3061
Fast-forward
 swh/loader/package/debian/loader.py              | 53 +++++++++++++----------
 swh/loader/package/debian/tests/test_debian.py   | 55 ++++++++++++++++++++++++
 swh/loader/package/nixguix/tests/test_nixguix.py | 33 ++++++++------
 swh/loader/package/npm/loader.py                 | 26 ++++++-----
 swh/loader/package/pypi/loader.py                | 37 ++++++++++------
 swh/loader/package/pypi/tests/test_pypi.py       | 11 +++++
 6 files changed, 157 insertions(+), 58 deletions(-)
Changes applied before test
commit 6da306159137c309ea3aa987589415c4b8f81de3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Mar 17 09:32:37 2021 +0100

    npm, pypi: Split original_artifact parsing out of artifact_to_revision_id
    
    We will need it independently in a future commit

commit 3190406239d231d62b8583bb1aeb74def1016a08
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Mar 18 16:18:40 2021 +0100

    debian: Split original_artifact parsing out of resolve_revision_from
    
    We will need it independently in a future commit

commit 4960aa38b10b29d249b28b92e78e0308f482e551
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Mar 17 10:02:21 2021 +0100

    debian: Make resolve_revision_from use the sha256 of the .dsc
    
    Instead of the sha256 + name + ... of all the files of a package.
    
    This will be needed to transition to ExtID, as we can't reasonably write
    this large set in the ExtID storage; and the sha256 of the .dsc is good
    enough, as the .dsc contains hashes and names of other files.

commit cf8fc9e9bd19b4f303b33ddeb73dc285d835bfcc
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Mar 16 14:44:33 2021 +0100

    test_load_nixguix_one_common_artifact_from_other_loader: Test all log lines, not just the last one
    
    It's stricter and more readable.

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/413/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/413/console

yes, even better.

Thanks.

This revision is now accepted and ready to land.Mar 23 2021, 12:28 PM

Build has FAILED

Patch application report for D5287 (id=19025)

Could not rebase; Attempt merge onto 9a339e981a...

Updating 9a339e9..749dd28
Fast-forward
 swh/loader/package/debian/loader.py            | 53 ++++++++++++++-----------
 swh/loader/package/debian/tests/test_debian.py | 55 ++++++++++++++++++++++++++
 swh/loader/package/npm/loader.py               | 26 +++++++-----
 swh/loader/package/pypi/loader.py              | 37 +++++++++++------
 swh/loader/package/pypi/tests/test_pypi.py     | 11 ++++++
 5 files changed, 136 insertions(+), 46 deletions(-)
Changes applied before test
commit 749dd287432d116be5f4a69335ef35c244923ca6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Mar 17 09:32:37 2021 +0100

    npm, pypi: Split original_artifact parsing out of artifact_to_revision_id
    
    We will need it independently in a future commit

commit e7ae6364c06a4d1e1863e8c1e948cd3a82ca4286
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Mar 18 16:18:40 2021 +0100

    debian: Split original_artifact parsing out of resolve_revision_from
    
    We will need it independently in a future commit

commit 268f83e943a439958840b4d8cdcff5c9eb16429e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Mar 17 10:02:21 2021 +0100

    debian: Make resolve_revision_from use the sha256 of the .dsc
    
    Instead of the sha256 + name + ... of all the files of a package.
    
    This will be needed to transition to ExtID, as we can't reasonably write
    this large set in the ExtID storage; and the sha256 of the .dsc is good
    enough, as the .dsc contains hashes and names of other files.

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/425/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/425/console

Build has FAILED

Patch application report for D5287 (id=19029)

Could not rebase; Attempt merge onto 9a339e981a...

Updating 9a339e9..dc197af
Fast-forward
 swh/loader/package/debian/loader.py            | 53 ++++++++++++++-----------
 swh/loader/package/debian/tests/test_debian.py | 55 ++++++++++++++++++++++++++
 swh/loader/package/npm/loader.py               | 26 +++++++-----
 swh/loader/package/pypi/loader.py              | 37 +++++++++++------
 swh/loader/package/pypi/tests/test_pypi.py     | 11 ++++++
 5 files changed, 136 insertions(+), 46 deletions(-)
Changes applied before test
commit dc197af60c60f061df3ac35d7e7f94b0a0262140
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Mar 17 09:32:37 2021 +0100

    npm, pypi: Split original_artifact parsing out of artifact_to_revision_id
    
    We will need it independently in a future commit

commit e7ae6364c06a4d1e1863e8c1e948cd3a82ca4286
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Mar 18 16:18:40 2021 +0100

    debian: Split original_artifact parsing out of resolve_revision_from
    
    We will need it independently in a future commit

commit 268f83e943a439958840b4d8cdcff5c9eb16429e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Mar 17 10:02:21 2021 +0100

    debian: Make resolve_revision_from use the sha256 of the .dsc
    
    Instead of the sha256 + name + ... of all the files of a package.
    
    This will be needed to transition to ExtID, as we can't reasonably write
    this large set in the ExtID storage; and the sha256 of the .dsc is good
    enough, as the .dsc contains hashes and names of other files.

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/429/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/429/console

Build is green

Patch application report for D5287 (id=19033)

Could not rebase; Attempt merge onto 9a339e981a...

Updating 9a339e9..f827d04
Fast-forward
 swh/loader/package/debian/loader.py            | 53 ++++++++++++++-----------
 swh/loader/package/debian/tests/test_debian.py | 55 ++++++++++++++++++++++++++
 swh/loader/package/npm/loader.py               | 26 +++++++-----
 swh/loader/package/pypi/loader.py              | 37 +++++++++++------
 swh/loader/package/pypi/tests/test_pypi.py     | 14 +++++++
 5 files changed, 139 insertions(+), 46 deletions(-)
Changes applied before test
commit f827d04cb538038cc38aceb7ee55c5465c444f52
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Mar 17 09:32:37 2021 +0100

    npm, pypi: Split original_artifact parsing out of artifact_to_revision_id
    
    We will need it independently in a future commit

commit e7ae6364c06a4d1e1863e8c1e948cd3a82ca4286
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Mar 18 16:18:40 2021 +0100

    debian: Split original_artifact parsing out of resolve_revision_from
    
    We will need it independently in a future commit

commit 268f83e943a439958840b4d8cdcff5c9eb16429e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Mar 17 10:02:21 2021 +0100

    debian: Make resolve_revision_from use the sha256 of the .dsc
    
    Instead of the sha256 + name + ... of all the files of a package.
    
    This will be needed to transition to ExtID, as we can't reasonably write
    this large set in the ExtID storage; and the sha256 of the .dsc is good
    enough, as the .dsc contains hashes and names of other files.

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/433/ for more details.