Page Menu
Home
Software Heritage
Search
Configure Global Search
Log In
Files
F8394157
No One
Temporary
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Flag For Later
Size
5 KB
Subscribers
None
View Options
diff --git a/swh/lister/crates/__init__.py b/swh/lister/crates/__init__.py
index 8cb6519..c4ca72c 100644
--- a/swh/lister/crates/__init__.py
+++ b/swh/lister/crates/__init__.py
@@ -1,142 +1,142 @@
# Copyright (C) 2022 the Software Heritage developers
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
"""
Crates lister
=============
The Crates lister list origins from `Crates.io`_, the Rust community’s crate registry.
Origins are `packages`_ for the `Rust language`_ ecosystem.
Package follow a `layout specifications`_ to be usable with the `Cargo`_ package manager
and have a `Cargo.toml`_ file manifest which consists in metadata to describe and build
a specific package version.
As of August 2022 `Crates.io`_ list 89013 packages name for a total of 588215 released
versions.
Origins retrieving strategy
---------------------------
A json http api to list packages from crates.io but we choose a `different strategy`_
in order to reduce to its bare minimum the amount of http call and bandwidth.
We clone a git repository which contains a tree of directories whose last child folder
name corresponds to the package name and contains a Cargo.toml file with some json data
to describe all existing versions of the package.
It takes a few seconds to clone the repository and browse it to build a full index of
existing package and related versions.
The lister is incremental, so the first time it clones and browses the repository as
previously described then stores the last seen commit id.
Next time, it retrieves the list of new and changed files since last commit id and
returns new or changed package with all of their related versions.
Note that all Git related operations are done with `Dulwich`_, a Python
implementation of the Git file formats and protocols.
Page listing
------------
Each page is related to one package.
Each line of a page corresponds to different versions of this package.
The data schema for each line is:
* **name**: Package name
* **version**: Package version
* **crate_file**: Package download url
* **checksum**: Package download checksum
* **yanked**: Whether the package is yanked or not
* **last_update**: Iso8601 last update date computed upon git commit date of the
related Cargo.toml file
Origins from page
-----------------
The lister yields one origin per page.
The origin url corresponds to the http api url for a package, for example
"https://crates.io/api/v1/crates/{package}".
Additionally we add some data set to "extra_loader_arguments":
* **artifacts**: Represent data about the Crates to download, following
- :ref:`original-artifacts-json specification <original-artifacts-json>`
+ :ref:`original-artifacts-json specification <extrinsic-metadata-original-artifacts-json>`
* **crates_metadata**: To store all other interesting attributes that do not belongs
to artifacts. For now it mainly indicate when a version is `yanked`_.
Origin data example::
{
"url": "https://crates.io/api/v1/crates/rand",
"artifacts": [
{
"checksums": {
"sha256": "48a45b46c2a8c38348adb1205b13c3c5eb0174e0c0fec52cc88e9fb1de14c54d", # noqa: B950
},
"filename": "rand-0.1.1.crate",
"url": "https://static.crates.io/crates/rand/rand-0.1.1.crate",
"version": "0.1.1",
},
{
"checksums": {
"sha256": "6e229ed392842fa93c1d76018d197b7e1b74250532bafb37b0e1d121a92d4cf7", # noqa: B950
},
"filename": "rand-0.1.2.crate",
"url": "https://static.crates.io/crates/rand/rand-0.1.2.crate",
"version": "0.1.2",
},
],
"crates_metadata": [
{
"version": "0.1.1",
"yanked": False,
},
{
"version": "0.1.2",
"yanked": False,
},
],
}
Running tests
-------------
Activate the virtualenv and run from within swh-lister directory:
pytest -s -vv --log-cli-level=DEBUG swh/lister/crates/tests
Testing with Docker
-------------------
Change directory to swh/docker then launch the docker environment:
docker-compose up -d
Then connect to the lister:
docker exec -it docker_swh-lister_1 bash
And run the lister (The output of this listing results in “oneshot” tasks in the scheduler):
swh lister run -l crates
.. _Crates.io: https://crates.io
.. _packages: https://doc.rust-lang.org/book/ch07-01-packages-and-crates.html
.. _Rust language: https://www.rust-lang.org/
.. _layout specifications: https://doc.rust-lang.org/cargo/guide/project-layout.html
.. _Cargo: https://doc.rust-lang.org/cargo/guide/why-cargo-exists.html#enter-cargo
.. _Cargo.toml: https://doc.rust-lang.org/cargo/reference/manifest.html
.. _different strategy: https://crates.io/data-access
.. _Dulwich: https://www.dulwich.io/
.. _yanked: https://doc.rust-lang.org/cargo/reference/publishing.html#cargo-yank
"""
def register():
from .lister import CratesLister
return {
"lister": CratesLister,
"task_modules": ["%s.tasks" % __name__],
}
File Metadata
Details
Attached
Mime Type
text/x-diff
Expires
Wed, Jun 4, 7:22 PM (4 d, 9 h ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3265496
Attached To
rDLS Listers
Event Timeline
Log In to Comment