Page MenuHomeSoftware Heritage

D8199.id29597.diff
No OneTemporary

D8199.id29597.diff

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -17,6 +17,7 @@
hooks:
- id: codespell
name: Check source code spelling
+ args: [-L crate]
stages: [commit]
- repo: local
diff --git a/CONTRIBUTORS b/CONTRIBUTORS
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -3,3 +3,4 @@
Loïc Dachary
Matthew Vernon
Thibault Allançon
+Franck Bret
diff --git a/user/listers/crates.rst b/user/listers/crates.rst
--- a/user/listers/crates.rst
+++ b/user/listers/crates.rst
@@ -3,5 +3,112 @@
Crates lister
=============
-.. todo::
- This page is a work in progress.
+The Crates lister list origins from `Crates.io`_, the Rust community’s crate registry.
+
+Origins are `packages`_ for the `Rust language`_ ecosystem.
+Package follow a `layout specifications`_ to be usable with the `Cargo`_ package manager and have a `Cargo.toml`_ file manifest which consists in metadata to describe and build a specific package version.
+
+As of August 2022 `Crates.io`_ list 89013 packages name for a total of 588215 released versions.
+
+Origins retrieving strategy
+---------------------------
+
+A json http api to list packages from crates.io but we choose a `different strategy`_ in order to reduce to its bare minimum the amount of http call and bandwidth.
+We clone a git repository which contains a tree of directories whose last child folder name corresponds to the package name and contains a Cargo.toml file with some json data to describe all existing versions of the package.
+It takes a few seconds to clone the repository and browse it to build a full index of existing package and related versions.
+The lister is incremental, so the first time it clones and browses the repository as previously described then stores the last seen commit id.
+Next time, it retrieves the list of new and changed files since last commit id and returns new or changed package with all of their related versions.
+
+Note that all Git related operations are done with `Dulwich`_, a Python implementation of the Git file formats and protocols.
+
+Page listing
+------------
+
+Each page is related to one package.
+Each line of a page corresponds to different versions of this package.
+
+The data schema for each line is:
+
+* **name**: Package name
+* **version**: Package version
+* **crate_file**: Package download url
+* **checksum**: Package download checksum
+* **yanked**: Whether the package is yanked or not
+* **last_update**: Iso8601 last update date computed upon git commit date of the related Cargo.toml file
+
+Origins from page
+-----------------
+
+The lister yields one origin per page.
+The origin url corresponds to the http api url for a package, for example "https://crates.io/api/v1/crates/{package}".
+
+Additionally we add some data set to "extra_loader_arguments":
+
+* **artifacts**: Represent data about the Crates to download, following :ref:`original-artifacts-json specification <extrinsic-metadata-specification#original-artifacts-json>`
+* **crates_metadata**: To store all other interesting attributes that do not belongs to artifacts. For now it mainly indicate when a version is `yanked`_.
+
+Origin data example::
+
+ {
+ "url": "https://crates.io/api/v1/crates/rand",
+ "artifacts": [
+ {
+ "checksums": {
+ "sha256": "48a45b46c2a8c38348adb1205b13c3c5eb0174e0c0fec52cc88e9fb1de14c54d",
+ },
+ "filename": "rand-0.1.1.crate",
+ "url": "https://static.crates.io/crates/rand/rand-0.1.1.crate",
+ "version": "0.1.1",
+ },
+ {
+ "checksums": {
+ "sha256": "6e229ed392842fa93c1d76018d197b7e1b74250532bafb37b0e1d121a92d4cf7",
+ },
+ "filename": "rand-0.1.2.crate",
+ "url": "https://static.crates.io/crates/rand/rand-0.1.2.crate",
+ "version": "0.1.2",
+ },
+ ],
+ "crates_metadata": [
+ {
+ "version": "0.1.1",
+ "yanked": False,
+ },
+ {
+ "version": "0.1.2",
+ "yanked": False,
+ },
+ ],
+ }
+
+Running tests
+-------------
+
+Activate the virtualenv and run from within swh-lister directory:
+
+ pytest -s -vv --log-cli-level=DEBUG swh/lister/crates/tests
+
+Testing with Docker
+-------------------
+
+Change directory to swh/docker then launch the docker environment:
+
+ docker-compose up -d
+
+Then connect to the lister:
+
+ docker exec -it docker_swh-lister_1 bash
+
+And run the lister (The output of this listing results in “oneshot” tasks in the scheduler):
+
+ swh lister run -l crates
+
+.. _Crates.io: https://crates.io
+.. _packages: https://doc.rust-lang.org/book/ch07-01-packages-and-crates.html
+.. _Rust language: https://www.rust-lang.org/
+.. _layout specifications: https://doc.rust-lang.org/cargo/guide/project-layout.html
+.. _Cargo: https://doc.rust-lang.org/cargo/guide/why-cargo-exists.html#enter-cargo
+.. _Cargo.toml: https://doc.rust-lang.org/cargo/reference/manifest.html
+.. _different strategy: https://crates.io/data-access
+.. _Dulwich: https://www.dulwich.io/
+.. _yanked: https://doc.rust-lang.org/cargo/reference/publishing.html#cargo-yank

File Metadata

Mime Type
text/plain
Expires
Dec 19 2024, 9:04 PM (11 w, 4 d ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3232577

Event Timeline