Updating D7654: Crates.io lister, create one origin per package instead of per version
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Apr 27 2022
Refactor crates.io lister
Apr 26 2022
In D7654#200261, @ardumont wrote:Ok, but what will be the usage of versioned artifacts as extra_loader_arguments?
We don't want to create one origin per version of a package.
We want all versions of a packages seen under the same origin (well, at the time of the ingestion).See this thread where we made the mistake for the maven ingestion [1]
[1] https://sympa.inria.fr/sympa/arc/swh-devel/2022-04/msg00043.html
Is that possible at all for the rust ingestion?
Cheers,
In D7654#200194, @ardumont wrote:wait... we are creating one origin for each version of the crate?
Yes it is.
I guess Instead of returning an url for a version (i.e: https://static.crates.io/crates/{package_name}/{package_name}-{version}.crate ) lister should return an http api url (i.e : https://crates.io/api/v1/crates/{package_name} ) ?
Yes, this one.
And i'd say, return the list of associated versioned artifacts as extra_loader_arguments.
@vlorentz thoughts?
In D7654#199643, @ardumont wrote:Once @vlorentz's comment is adressed (we should list one origin, the package i guess, and
gather all crate versions under that same origin).Make sure your commit message (one to one with the diff description) is one short
summary (in imperative form) [1]. For some extra description, use multiple lines.[1] https://docs.softwareheritage.org/devel/contributing/git-style-guide.html
TIA
In D7654#199617, @vlorentz wrote:wait... we are creating one origin for each version of the crate?
Proposed evolution on crate lister to be consistent with the loader here :
In D7501#199444, @ardumont wrote:yes, rename to package_name, the other change would make for a much wider impact (than you probably want to deal with ;).
It would not. The get_loader function is not used anywhere outside of this cli.py
right, nvm then.
I was thinking of something else (iirc, there is something similar in the lister or something)
i'd prefer that renaming nonetheless though, it's more explicit.
Apr 25 2022
fix missing whitespace in loader specification documentation
Add crates package loader specification
Apr 23 2022
Not sure you have seen this comment before, and because its an important point to go further, I repost :
Trying another alternative to make mypy happy (don't get why got no error in dev env running tox -e mypy or mypy -c mypy.ini but fail on CI..)
mypy multiline ignore fix
Apr 22 2022
Fixes after ardumont and vlorentz review
In D7501#197889, @franckbret wrote:In D7501#197872, @franckbret wrote:Some fixes after @Alphare review. Mainly add 'name' and 'version' to loader args as those are given through the 'extra_loader_arguments' from lister.
One important thing to note after adding 'name' and 'version' args is that it does not work with the CLI in the docker environment..
franck@debian-franck:~/workspace/swh-environment/docker$ docker-compose exec swh-loader swh loader run crates "https://static.crates.io/crates/micro-timer/micro-timer-0.4.0.crate" version="0.4.0" name="micro-timer" Traceback (most recent call last): ... File "/src/swh-loader-core/swh/loader/cli.py", line 104, in run loader = get_loader(type, url=url, storage=conf["storage"], **kw) TypeError: get_loader() got multiple values for argument 'name'What do you think about that ?
Better renaming 'name' to 'package_name' in the crate lister and loader or should the cli arg evolve to something like 'loader_name'?
Apr 14 2022
In D7501#197872, @franckbret wrote:Some fixes after @Alphare review. Mainly add 'name' and 'version' to loader args as those are given through the 'extra_loader_arguments' from lister.
Some fixes after @Alphare review. Mainly add 'name' and 'version' to loader args as those are given through the 'extra_loader_arguments' from lister.
Apr 11 2022
Here is a first crates.io loader.
Mar 28 2022
Mar 24 2022
lister: Add new rust crates lister
Hi, thanks you all for the review.
Mar 22 2022
Fixes after first code review
Mar 21 2022
Mar 18 2022
Pulling this as draft
Cancelling to move those changes in D7367