Related to T1424
Details
- Reviewers
Alphare ardumont - Group Reviewers
Reviewers - Maniphest Tasks
- T1424: Add crates.io (Rust) lister
- Commits
- rDLSfea6fc04aab5: lister: Add new rust crates lister
- Required Signatures
L3 Software Heritage Contributor License Agreement, version 1.0
Diff Detail
- Repository
- rDLS Listers
- Branch
- master
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 27629 Build 43247: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 43246: arc lint + arc unit
Event Timeline
Build has FAILED
Patch application report for D7367 (id=26633)
Rebasing onto fd03941c5f...
Current branch diff-target is up to date.
Changes applied before test
commit fa7869cf9f9a863591d7e85d2659671abbcf6aca Author: Franck Bret <franck.bret@octobus.net> Date: Thu Mar 17 12:13:04 2022 +0100 WIP: Start rust crates lister
Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/477/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/477/console
@franckbret Hello, thanks for the diff, you need to "accept" the L3 document (well read it first ;) so we can see the content of your diff.
Cheers,
Build is green
Patch application report for D7367 (id=26663)
Rebasing onto fd03941c5f...
Current branch diff-target is up to date.
Changes applied before test
commit 421ce9b70ebd7f6390c4287bf0290aee384469d9 Author: Franck Bret <franck.bret@octobus.net> Date: Thu Mar 17 12:13:04 2022 +0100 WIP: Start rust crates lister
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/480/ for more details.
Build is green
Patch application report for D7367 (id=26734)
Rebasing onto fd03941c5f...
Current branch diff-target is up to date.
Changes applied before test
commit 060d0308ee75cf598b1b206e880ea029142fa1d5 Author: Franck Bret <franck.bret@octobus.net> Date: Thu Mar 17 12:13:04 2022 +0100 WIP: Start rust crates lister
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/481/ for more details.
Oops, it seems like this got out of draft too fast. arc says --update and --draft are mutually exclusive for some reason so it's not obvious to me how to tell Phabricator that we still want a diff to stay draft even if the CI passes.
Anyway, I'm doing my "pre-review" here then. ;)
swh/lister/crates/__init__.py | ||
---|---|---|
1 | ||
swh/lister/crates/lister.py | ||
28 | Suggesting a slight rewording aside from fixing the typos: It basically fetches https://github.com/rust-lang/crates.io-index.git to a temp directory and then walks through each file to get the crate's info. | |
62–65 | ||
83 | ||
97 | ||
100 | What should we do about the yanked information? It seems like it would be useful to store. @vlorentz may have an opinion | |
124 | last_update serves only as an implementation detail for the SWH scheduler to know when to schedule a new load. We can indeed use the date of the last commit touching the particular file we're looking at. FYI The crates git repo's history is periodically squashed to preserve space, but that does not change anything in that case. git log -1 --pretty="format:%ci" /path/to/file) is the method to get the timestamp. | |
126 | @vlorentz is this the normal way of passing arguments to the loader? I'm wondering because I'm also thinking about using this for the Golang lister, so it's both birds with one stone | |
swh/lister/crates/tests/__init__.py | ||
1 | ||
swh/lister/crates/tests/data/fake_crates_repository_init.sh | ||
5 | ||
swh/lister/crates/tests/test_lister.py | ||
78 | You need to check that scheduler_origins_sorted and expected_origins_sorted are the same length (and not 0), otherwise you may fail to detect missing origins. |
swh/lister/crates/lister.py | ||
---|---|---|
38 | Good question, it depends. Is that something which is voluminous? Anyway, I guess we'd be on the safe side if we could make this configurable. | |
93 | let the logger do the formatting. | |
swh/lister/crates/tests/__init__.py | ||
29 | i know that code from other loader tests i think, maybe we need to move that down so it's commonly used (later). |
Build is green
Patch application report for D7367 (id=26788)
Rebasing onto ff0035a60b...
First, rewinding head to replay your work on top of it... Applying: WIP: Start rust crates lister
Changes applied before test
commit 9a19ce872fa81a390a585ec9b009dfb0f2c699c3 Author: Franck Bret <franck.bret@octobus.net> Date: Thu Mar 17 12:13:04 2022 +0100 WIP: Start rust crates lister
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/482/ for more details.
swh/lister/crates/lister.py | ||
---|---|---|
38 | does that change much in between runs? |
lgtm
We can always evolve the part of the /tmp working folder later.
@franckbret Can you please drop the WIP from the diff description (and the commit message)?
Out of curiosityr, have you make this lister run in docker or something?
swh/lister/crates/lister.py | ||
---|---|---|
38 | it's a git repo with dozens of commits a day, so... yes, but slowly |
Hi, thanks you all for the review.
Yes, I had successfully run the script with Docker.
Will drop the WIP now.
lister: Add new rust crates lister
The Crates lister retrieves crates package for Rust lang.
It basically fetches https://github.com/rust-lang/crates.io-index.git
to a temp directory and then walks through each file to get the
crate's info.
Build is green
Patch application report for D7367 (id=26843)
Rebasing onto ff0035a60b...
First, rewinding head to replay your work on top of it... Applying: lister: Add new rust crates lister
Changes applied before test
commit fe087aa2aa83dfadb824822d84e90fb7a7a18dce Author: Franck Bret <franck.bret@octobus.net> Date: Thu Mar 17 12:13:04 2022 +0100 lister: Add new rust crates lister The Crates lister retrieves crates package for Rust lang. It basically fetches https://github.com/rust-lang/crates.io-index.git to a temp directory and then walks through each file to get the crate's info.
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/483/ for more details.
Looks good to me now. There still remains the question of whether we want to store the yanked info for each release somewhere. Off the top of my head it would make the (upcoming) incremental listing/loading less efficient. Maybe that's out of scope for now. :)
Build is green
Patch application report for D7367 (id=26932)
Rebasing onto ff0035a60b...
Current branch diff-target is up to date.
Changes applied before test
commit fea6fc04aab5e0ba54476f5793bc7f3b51de3b63 Author: Franck Bret <franck.bret@octobus.net> Date: Thu Mar 17 12:13:04 2022 +0100 lister: Add new rust crates lister The Crates lister retrieves crates package for Rust lang. It basically fetches https://github.com/rust-lang/crates.io-index.git to a temp directory and then walks through each file to get the crate's info.
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/484/ for more details.