Changeset View
Standalone View
user/listers/crates.rst
.. _crates_lister: | .. _crates_lister: | |||||||||||
Crates lister | Crates lister | |||||||||||
============= | ============= | |||||||||||
.. todo:: | The Crates lister list origins from `Crates.io`_, the Rust community’s crate registry. | |||||||||||
This page is a work in progress. | ||||||||||||
Origins are `packages`_ for the `Rust language`_ ecosystem. | ||||||||||||
Package follow a `layout specifications`_ to be usable with the `Cargo`_ package manager and have a `Cargo.toml`_ file manifest which consists in metadata to describe and build a specific package version. | ||||||||||||
As of August 2022 `Crates.io`_ list 89013 packages name for a total of 588215 released versions. | ||||||||||||
Origins retrieving strategy | ||||||||||||
--------------------------- | ||||||||||||
A json http api to list packages from crates.io but we choose a `different strategy`_ in order to reduce to its bare minimum the amount of http call and bandwidth. | ||||||||||||
We clone a git repository which contains a tree of directories whose last child folder name corresponds to the package name and contains a Cargo.toml file with some json data to describe all existing versions of the package. | ||||||||||||
It takes a few seconds to clone the repository and browse it to build a full index of existing package and related versions. | ||||||||||||
The lister is incremental, so the first time it clones and browses the repository as previously described then stores the last seen commit id. | ||||||||||||
ardumontUnsubmitted Done Inline Actions
ardumont: | ||||||||||||
Next time, it retrieves the list of new and changed files since last commit id and returns new or changed package with all of their related versions. | ||||||||||||
Done Inline Actions
ardumont: | ||||||||||||
Note that all Git related operations are done with `Dulwich`_, a Python implementation of the Git file formats and protocols. | ||||||||||||
Page listing | ||||||||||||
------------ | ||||||||||||
Each page is related to one package. | ||||||||||||
Each line of a page corresponds to different versions of this package. | ||||||||||||
The data schema for each line is: | ||||||||||||
Done Inline Actions
ardumont: | ||||||||||||
Done Inline Actions"envelope", actually; "envelop" is a verb. But I don't understand what it is supposed to mean. Why not "schema" or "keys"? vlorentz: "envelope", actually; "envelop" is a verb.
But I don't understand what it is supposed to mean. | ||||||||||||
* **name**: Package name | ||||||||||||
* **version**: Package version | ||||||||||||
* **crate_file**: Package download url | ||||||||||||
* **checksum**: Package download checksum | ||||||||||||
* **yanked**: Whether the package is yanked or not | ||||||||||||
* **last_update**: Iso8601 last update date computed upon git commit date of the related Cargo.toml file | ||||||||||||
Origins from page | ||||||||||||
----------------- | ||||||||||||
The lister yields one origin per page. | ||||||||||||
Done Inline Actions
ardumont: | ||||||||||||
The origin url corresponds to the http api url for a package, for example "https://crates.io/api/v1/crates/{package}". | ||||||||||||
Done Inline Actions
ardumont: | ||||||||||||
Additionally we add some data set to "extra_loader_arguments": | ||||||||||||
Done Inline Actions
This is an internal link vlorentz: This is an internal link | ||||||||||||
Done Inline ActionsWhat should be the correct syntax to reference a section in another document? I tried those but it still fails building. <extrinsic-metadata-specification#original-artifacts-json> franckbret: What should be the correct syntax to reference a section in another document?
I tried those… | ||||||||||||
Done Inline Actionsjust <original-artifacts-json> vlorentz: just <original-artifacts-json> | ||||||||||||
Done Inline ActionsIt still fail. /var/lib/jenkins/workspace/DDOC/build-on-diff/swh-docs/user/listers/crates.rst:47: WARNING: undefined label: original-artifacts-json By the way what is the quick way to check for errors within my venv? I've tried with make docs but it fails, make all too. Looks like it complains about svg files and dia. Do I need to install dia or is it unexpected? franckbret: It still fail.
/var/lib/jenkins/workspace/DDOC/build-on-diff/swh-docs/user/listers/crates.rst… | ||||||||||||
Not Done Inline Actions/var/lib/jenkins/workspace/DDOC/build-on-diff/swh-docs/user/listers/crates.rst:47: WARNING: undefined label: original-artifacts-json original-artifacts-json is defined in a different git repo, but this shouldn't be an issue. Hmm.....
Jenkins runs tox -e sphinx-dev vlorentz: ```
/var/lib/jenkins/workspace/DDOC/build-on-diff/swh-docs/user/listers/crates.rst:47: WARNING… | ||||||||||||
Not Done Inline ActionsOH, I got it. it's because you added this to the user documentation. We could add an inter-sphinx link, but I'd rather you move this to developer documentation instead; this page only contains technical details that aren't useful to end users. vlorentz: OH, I got it. it's because you added this to the user documentation. We could add an inter… | ||||||||||||
Done Inline ActionsAh, ok. franckbret: Ah, ok.
What should be the path of that file?
| ||||||||||||
Done Inline ActionsMake it a module docstring in swh/listers/crates/__init__.py so it shows up on https://docs.softwareheritage.org/devel/apidoc/swh.lister.crates.html which is linked from https://docs.softwareheritage.org/user/listers/index.html vlorentz: Make it a module docstring in `swh/listers/crates/__init__.py` so it shows up on https://docs. | ||||||||||||
Done Inline ActionsOk, I moved it to module level, see D8206 franckbret: Ok, I moved it to module level, see D8206 | ||||||||||||
* **artifacts**: Represent data about the Crates to download, following :ref:`original-artifacts-json specification <original-artifacts-json>` | ||||||||||||
Done Inline Actionsbuild failure is not happy about this one [1] [2] [1] 12:19:57 /var/lib/jenkins/workspace/DDOC/build-on-diff/swh-docs/user/listers/crates.rst:48: ERROR: Unknown target name: "original-artifacts-json specification". [2] https://jenkins.softwareheritage.org/job/DDOC/job/build-on-diff/202/console ardumont: build failure is not happy about this one [1] [2]
[1]
```
12:19:57… | ||||||||||||
* **crates_metadata**: To store all other interesting attributes that do not belongs to artifacts. For now it mainly indicate when a version is `yanked`_. | ||||||||||||
Done Inline Actions
ardumont: | ||||||||||||
Origin data example:: | ||||||||||||
{ | ||||||||||||
"url": "https://crates.io/api/v1/crates/rand", | ||||||||||||
"artifacts": [ | ||||||||||||
{ | ||||||||||||
"checksums": { | ||||||||||||
"sha256": "48a45b46c2a8c38348adb1205b13c3c5eb0174e0c0fec52cc88e9fb1de14c54d", | ||||||||||||
}, | ||||||||||||
"filename": "rand-0.1.1.crate", | ||||||||||||
"url": "https://static.crates.io/crates/rand/rand-0.1.1.crate", | ||||||||||||
"version": "0.1.1", | ||||||||||||
}, | ||||||||||||
{ | ||||||||||||
"checksums": { | ||||||||||||
"sha256": "6e229ed392842fa93c1d76018d197b7e1b74250532bafb37b0e1d121a92d4cf7", | ||||||||||||
}, | ||||||||||||
"filename": "rand-0.1.2.crate", | ||||||||||||
"url": "https://static.crates.io/crates/rand/rand-0.1.2.crate", | ||||||||||||
"version": "0.1.2", | ||||||||||||
}, | ||||||||||||
], | ||||||||||||
Done Inline Actions
vlorentz: | ||||||||||||
"crates_metadata": [ | ||||||||||||
{ | ||||||||||||
"version": "0.1.1", | ||||||||||||
"yanked": False, | ||||||||||||
}, | ||||||||||||
{ | ||||||||||||
"version": "0.1.2", | ||||||||||||
"yanked": False, | ||||||||||||
}, | ||||||||||||
], | ||||||||||||
} | ||||||||||||
Running tests | ||||||||||||
------------- | ||||||||||||
Activate the virtualenv and run from within swh-lister directory: | ||||||||||||
pytest -s -vv --log-cli-level=DEBUG swh/lister/crates/tests | ||||||||||||
Testing with Docker | ||||||||||||
------------------- | ||||||||||||
Change directory to swh/docker then launch the docker environment: | ||||||||||||
docker-compose up -d | ||||||||||||
Then connect to the lister: | ||||||||||||
docker exec -it docker_swh-lister_1 bash | ||||||||||||
And run the lister (The output of this listing results in “oneshot” tasks in the scheduler): | ||||||||||||
swh lister run -l crates | ||||||||||||
.. _Crates.io: https://crates.io | ||||||||||||
.. _packages: https://doc.rust-lang.org/book/ch07-01-packages-and-crates.html | ||||||||||||
.. _Rust language: https://www.rust-lang.org/ | ||||||||||||
.. _layout specifications: https://doc.rust-lang.org/cargo/guide/project-layout.html | ||||||||||||
.. _Cargo: https://doc.rust-lang.org/cargo/guide/why-cargo-exists.html#enter-cargo | ||||||||||||
.. _Cargo.toml: https://doc.rust-lang.org/cargo/reference/manifest.html | ||||||||||||
.. _different strategy: https://crates.io/data-access | ||||||||||||
.. _Dulwich: https://www.dulwich.io/ | ||||||||||||
Done Inline Actions
see above vlorentz: see above | ||||||||||||
.. _yanked: https://doc.rust-lang.org/cargo/reference/publishing.html#cargo-yank |