Page MenuHomeSoftware Heritage

crates: rework to take advantage of data returned by the crates lister
ClosedPublic

Authored by franckbret on Apr 29 2022, 10:14 AM.

Details

Summary

The loader get 'artifacts' information through lister
'extra_loader_arguments'.

Related to T4104

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D7713 (id=27897)

Rebasing onto 2e27f7c7a6...

Current branch diff-target is up to date.
Changes applied before test
commit 995e2b0fa8c0304bd671c473cfc8650427ce2308
Author: Franck Bret <franck.bret@octobus.net>
Date:   Fri Apr 29 10:02:20 2022 +0200

    crates: rework to take advantage of data returned by the crates lister
    
    The loader get 'artifacts' information through lister
    'extra_loader_arguments'.
    
    Related to T4104

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/755/ for more details.

Add 'Franck Bret' to contributors

@franckbret

Arf, i reloaded the page (i was reviewing) but now lost the initial content of the diff.
You need to update back the diff with the range of commits you want to use.

arc diff HEAD~2 --update D7713  # assuming you are 2 commits ahead of what you want to diff.

Build is green

Patch application report for D7713 (id=27898)

Could not rebase; Attempt merge onto 2e27f7c7a6...

Updating 2e27f7c..4ca8420
Fast-forward
 CONTRIBUTORS                                   |   1 +
 docs/package-loader-specifications.rst         |   2 +-
 swh/loader/package/crates/loader.py            |  57 ++++++-----
 swh/loader/package/crates/tasks.py             |   6 +-
 swh/loader/package/crates/tests/test_crates.py | 136 ++++++++++++++++++++-----
 swh/loader/package/crates/tests/test_tasks.py  |   5 +-
 6 files changed, 149 insertions(+), 58 deletions(-)
Changes applied before test
commit 4ca8420d2024a4c36cb3bf2d8985ae14789d238c
Author: Franck Bret <franck.bret@octobus.net>
Date:   Fri Apr 29 10:35:40 2022 +0200

    Add 'Franck Bret' to contributors

commit 995e2b0fa8c0304bd671c473cfc8650427ce2308
Author: Franck Bret <franck.bret@octobus.net>
Date:   Fri Apr 29 10:02:20 2022 +0200

    crates: rework to take advantage of data returned by the crates lister
    
    The loader get 'artifacts' information through lister
    'extra_loader_arguments'.
    
    Related to T4104

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/756/ for more details.

Build has FAILED

Patch application report for D7713 (id=27899)

Rebasing onto 2e27f7c7a6...

Current branch diff-target is up to date.
Changes applied before test
commit 4ca8420d2024a4c36cb3bf2d8985ae14789d238c
Author: Franck Bret <franck.bret@octobus.net>
Date:   Fri Apr 29 10:35:40 2022 +0200

    Add 'Franck Bret' to contributors

commit 995e2b0fa8c0304bd671c473cfc8650427ce2308
Author: Franck Bret <franck.bret@octobus.net>
Date:   Fri Apr 29 10:02:20 2022 +0200

    crates: rework to take advantage of data returned by the crates lister
    
    The loader get 'artifacts' information through lister
    'extra_loader_arguments'.
    
    Related to T4104

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/757/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/757/console

Build has FAILED

Patch application report for D7713 (id=27899)

Rebasing onto 2e27f7c7a6...

Current branch diff-target is up to date.
Changes applied before test
commit 4ca8420d2024a4c36cb3bf2d8985ae14789d238c
Author: Franck Bret <franck.bret@octobus.net>
Date:   Fri Apr 29 10:35:40 2022 +0200

    Add 'Franck Bret' to contributors

commit 995e2b0fa8c0304bd671c473cfc8650427ce2308
Author: Franck Bret <franck.bret@octobus.net>
Date:   Fri Apr 29 10:02:20 2022 +0200

    crates: rework to take advantage of data returned by the crates lister
    
    The loader get 'artifacts' information through lister
    'extra_loader_arguments'.
    
    Related to T4104

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/758/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/758/console

swh/loader/package/crates/loader.py
239

You could make that dict of key artifact version, value artifact. ^

That'd simplify some stuff below.

all_versions = sorted(list(self.artifacts.keys()))
# artifact[version] for some value of version...
259

wondering whether that could work (although it's only use twice, one in the package loader and another with this ;)
so ¯\_(ツ)_/¯

269

with my previous suggestion.

294–298

with my previous suggestion.

swh/loader/package/crates/loader.py
300–301

There is some inconsistency here now i think.
This ExtrincitPackageMetadata should be done for the artifact url.

The self.info method is using self.url which is now the origin url (and not the artifact url).
So i think it's missing some refactoring to pass along the url to actually use.

That will then avoid the "strange" filtering command for the crate version just one line after my comment.

swh/loader/package/crates/loader.py
245–251
254–257

to be consistent with my concern about 'inconsistent behavior' below.

300–301

with my previous adaptations.

It's going the right way, thanks.

Still, I think there is some inconsistency that needs lifting.

See my suggestions inline.

(I did not check the tests yet)

This revision now requires changes to proceed.Apr 29 2022, 11:25 AM
franckbret added inline comments.
swh/loader/package/crates/loader.py
239

Sure!

300–301

Not sure for this.. Maybe I miss something so I may be wrong.

The api endpoint is for one origin, it returns generic + information for each version.
Let's say that for crate 'x' we call https//crates.io/api/v1/crates/x
We just need to call it once, no matter if there is one or many versions.

Calling self.info(artifact['url']) will fail because artifact['url'] is the archive to download not an url suitable to make an api call to get extrinsic metadata.

We can sure call a more specific endpoint by adding version to the http api url (something like this https//crates.io/api/v1/crates/x/0.0.1) but we will do as many http api call as existing versions for this crate.

I'll let other have the last word (i'm going afk for a bit ;)

This revision now requires review to proceed.Apr 29 2022, 7:42 PM
swh/loader/package/crates/loader.py
300–301

The api endpoint is for one origin, it returns generic + information for each version.
Let's say that for crate 'x' we call https//crates.io/api/v1/crates/x
We just need to call it once, no matter if there is one or many versions.

ah yeah, thx, that makes sense.

swh/loader/package/crates/loader.py
245–251

nevermind that.

254–257

nvm that too.

vlorentz added a subscriber: vlorentz.
vlorentz added inline comments.
swh/loader/package/crates/loader.py
212
This revision is now accepted and ready to land.May 2 2022, 1:49 PM
franckbret marked 3 inline comments as done.

Updating D7713: crates: rework to take advantage of data returned by the crates lister

Some small fixes after @ardumont and @vlorentz review.
Add myself to contributors.

Build is green

Patch application report for D7713 (id=27971)

Rebasing onto c06305e78a...

Current branch diff-target is up to date.
Changes applied before test
commit a097a946c2f2716ff0410192754cf959e1df5ee0
Author: Franck Bret <franck.bret@octobus.net>
Date:   Fri Apr 29 10:02:20 2022 +0200

    crates: rework to take advantage of data returned by the crates lister
    
    The loader get 'artifacts' information through lister
    'extra_loader_arguments'.
    
    Related to T4104

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/772/ for more details.