Page Menu
Home
Software Heritage
Search
Configure Global Search
Log In
Files
F9340817
D8206.id29626.diff
No One
Temporary
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Flag For Later
Size
4 KB
Subscribers
None
D8206.id29626.diff
View Options
diff --git a/swh/lister/crates/__init__.py b/swh/lister/crates/__init__.py
--- a/swh/lister/crates/__init__.py
+++ b/swh/lister/crates/__init__.py
@@ -3,6 +3,136 @@
# See top-level LICENSE file for more information
+"""
+Crates lister
+=============
+
+The Crates lister list origins from `Crates.io`_, the Rust community’s crate registry.
+
+Origins are `packages`_ for the `Rust language`_ ecosystem.
+Package follow a `layout specifications`_ to be usable with the `Cargo`_ package manager
+and have a `Cargo.toml`_ file manifest which consists in metadata to describe and build
+a specific package version.
+
+As of August 2022 `Crates.io`_ list 89013 packages name for a total of 588215 released
+versions.
+
+Origins retrieving strategy
+---------------------------
+
+A json http api to list packages from crates.io but we choose a `different strategy`_
+in order to reduce to its bare minimum the amount of http call and bandwidth.
+We clone a git repository which contains a tree of directories whose last child folder
+name corresponds to the package name and contains a Cargo.toml file with some json data
+to describe all existing versions of the package.
+It takes a few seconds to clone the repository and browse it to build a full index of
+existing package and related versions.
+The lister is incremental, so the first time it clones and browses the repository as
+previously described then stores the last seen commit id.
+Next time, it retrieves the list of new and changed files since last commit id and
+returns new or changed package with all of their related versions.
+
+Note that all Git related operations are done with `Dulwich`_, a Python
+implementation of the Git file formats and protocols.
+
+Page listing
+------------
+
+Each page is related to one package.
+Each line of a page corresponds to different versions of this package.
+
+The data schema for each line is:
+
+* **name**: Package name
+* **version**: Package version
+* **crate_file**: Package download url
+* **checksum**: Package download checksum
+* **yanked**: Whether the package is yanked or not
+* **last_update**: Iso8601 last update date computed upon git commit date of the
+ related Cargo.toml file
+
+Origins from page
+-----------------
+
+The lister yields one origin per page.
+The origin url corresponds to the http api url for a package, for example
+"https://crates.io/api/v1/crates/{package}".
+
+Additionally we add some data set to "extra_loader_arguments":
+
+* **artifacts**: Represent data about the Crates to download, following
+ :ref:`original-artifacts-json specification <original-artifacts-json>`
+* **crates_metadata**: To store all other interesting attributes that do not belongs
+ to artifacts. For now it mainly indicate when a version is `yanked`_.
+
+Origin data example::
+
+ {
+ "url": "https://crates.io/api/v1/crates/rand",
+ "artifacts": [
+ {
+ "checksums": {
+ "sha256": "48a45b46c2a8c38348adb1205b13c3c5eb0174e0c0fec52cc88e9fb1de14c54d", # noqa: B950
+ },
+ "filename": "rand-0.1.1.crate",
+ "url": "https://static.crates.io/crates/rand/rand-0.1.1.crate",
+ "version": "0.1.1",
+ },
+ {
+ "checksums": {
+ "sha256": "6e229ed392842fa93c1d76018d197b7e1b74250532bafb37b0e1d121a92d4cf7", # noqa: B950
+ },
+ "filename": "rand-0.1.2.crate",
+ "url": "https://static.crates.io/crates/rand/rand-0.1.2.crate",
+ "version": "0.1.2",
+ },
+ ],
+ "crates_metadata": [
+ {
+ "version": "0.1.1",
+ "yanked": False,
+ },
+ {
+ "version": "0.1.2",
+ "yanked": False,
+ },
+ ],
+ }
+
+Running tests
+-------------
+
+Activate the virtualenv and run from within swh-lister directory:
+
+ pytest -s -vv --log-cli-level=DEBUG swh/lister/crates/tests
+
+Testing with Docker
+-------------------
+
+Change directory to swh/docker then launch the docker environment:
+
+ docker-compose up -d
+
+Then connect to the lister:
+
+ docker exec -it docker_swh-lister_1 bash
+
+And run the lister (The output of this listing results in “oneshot” tasks in the scheduler):
+
+ swh lister run -l crates
+
+.. _Crates.io: https://crates.io
+.. _packages: https://doc.rust-lang.org/book/ch07-01-packages-and-crates.html
+.. _Rust language: https://www.rust-lang.org/
+.. _layout specifications: https://doc.rust-lang.org/cargo/guide/project-layout.html
+.. _Cargo: https://doc.rust-lang.org/cargo/guide/why-cargo-exists.html#enter-cargo
+.. _Cargo.toml: https://doc.rust-lang.org/cargo/reference/manifest.html
+.. _different strategy: https://crates.io/data-access
+.. _Dulwich: https://www.dulwich.io/
+.. _yanked: https://doc.rust-lang.org/cargo/reference/publishing.html#cargo-yank
+"""
+
+
def register():
from .lister import CratesLister
File Metadata
Details
Attached
Mime Type
text/plain
Expires
Thu, Jul 3, 11:12 AM (4 w, 10 h ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3219890
Attached To
D8206: crates: Add a developer documentation at module level
Event Timeline
Log In to Comment