Changeset View
Changeset View
Standalone View
Standalone View
docs/specs/spec-loading.rst
.. _swh-loading-specs: | |||||
Loading specification | Loading specification | ||||
===================== | ===================== | ||||
An important part of the deposit specifications is the loading procedure where | An important part of the deposit specifications is the loading procedure where | ||||
a deposit is ingested into the Software Heritage (archive), using | a deposit is ingested into the Software Heritage Archive (SWH) using | ||||
the tarball loader and the complete process of software artifacts creation | the deposit loader and the complete process of software artifacts creation | ||||
in the archive. | in the archive. | ||||
Tarball Loading | Deposit Loading | ||||
--------------- | --------------- | ||||
The ``swh-loader-tar`` module is already able to inject tarballs in swh | The ``swh.loader.package.deposit`` module is able to inject zipfile/tarball's | ||||
with very limited metadata (mainly the origin). | content in SWH with its metadata. | ||||
The loading of the deposit will use the deposit's associated data: | The loading of the deposit will use the deposit's associated data: | ||||
* the metadata | * the metadata | ||||
* the archive(s) | * the archive file(s) | ||||
Artifacts creation | Artifacts creation | ||||
------------------ | ------------------ | ||||
Deposit to artifacts mapping | Deposit to artifacts mapping | ||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
This is a global view of the deposit ingestion | This is a global view of the deposit ingestion | ||||
+------------------------------------+-----------------------------------------+ | +------------------------------------+-----------------------------------------+ | ||||
| swh artifact | representation in deposit | | | swh artifact | representation in deposit | | ||||
+====================================+=========================================+ | +====================================+=========================================+ | ||||
| origin | https://hal.inria.fr/hal-id | | | origin | https://hal.inria.fr/hal-id | | ||||
+------------------------------------+-----------------------------------------+ | +------------------------------------+-----------------------------------------+ | ||||
| origin_metadata | aggregated metadata | | | raw_extrinsic_metadata | aggregated metadata | | ||||
+------------------------------------+-----------------------------------------+ | +------------------------------------+-----------------------------------------+ | ||||
| snapshot | reception of all occurrences (branches) | | | snapshot | reception of all occurrences (branches) | | ||||
+------------------------------------+-----------------------------------------+ | +------------------------------------+-----------------------------------------+ | ||||
| branches | master & | | | branches | master & tags for releases | | ||||
| | branch (optional): tag to release | | | | (not yet implemented) | | ||||
+------------------------------------+-----------------------------------------+ | +------------------------------------+-----------------------------------------+ | ||||
| release | (optional) synthetic release created | | | release | (optional) synthetic release created | | ||||
| | from metadata | | | | from metadata (not yet implemented) | | ||||
+------------------------------------+-----------------------------------------+ | +------------------------------------+-----------------------------------------+ | ||||
| revision | synthetic revision pointing to | | | revision | synthetic revision pointing to | | ||||
| | the expanded submitted tarball | | | | the directory (see below) | | ||||
+------------------------------------+-----------------------------------------+ | +------------------------------------+-----------------------------------------+ | ||||
| directory | root directory of the expanded submitted| | | directory | root directory of the expanded submitted| | ||||
| | tarball | | | | tarball | | ||||
+------------------------------------+-----------------------------------------+ | +------------------------------------+-----------------------------------------+ | ||||
Origin artifact | Origin artifact | ||||
~~~~~~~~~~~~~~~ | ~~~~~~~~~~~~~~~ | ||||
We create an origin URL by concatenating the client URI and the value of the | We create an origin URL by concatenating the client's `provider_url` and the | ||||
Slug header of the initial POST request of the deposit. | value of the Slug header of the initial POST request of the deposit. | ||||
For examples: | |||||
.. code-block:: bash | |||||
$ http -pb https://archive.softwareheritage.org/api/1/origin/https://hal.archives-ouvertes.fr/hal-02560320/get/ | |||||
would result in: | |||||
.. code-block:: json | .. code-block:: json | ||||
vlorentz: this one too, for consistency | |||||
{ | { | ||||
"origin": { | "origin_visits_url": "https://archive.softwareheritage.org/api/1/origin/https://hal.archives-ouvertes.fr/hal-02560320/visits/", | ||||
"id": 89283768, | "url": "https://hal.archives-ouvertes.fr/hal-02560320" | ||||
"origin_visits_url": "/api/1/origin/89283768/visits/", | |||||
"type": "deposit", | |||||
"url": "https://hal.archives-ouvertes.fr/hal-02140606" | |||||
} | |||||
} | } | ||||
Visits | Visits | ||||
~~~~~~ | ~~~~~~ | ||||
We identify with a visit each deposit push of the same external_id. | We identify with a visit each deposit push of the same `external_id`. | ||||
Here in the example below, two snapshots are identified by two different visits. | Here in the example below, two snapshots are identified by two different visits. | ||||
For examples: | |||||
.. code-block:: bash | |||||
$ http -pb https://archive.softwareheritage.org/api/1/origin/https://hal.archives-ouvertes.fr/hal-02560320/visits/ | |||||
would result in: | |||||
.. code-block:: json | .. code-block:: json | ||||
Not Done Inline Actionscould we keep this block as json, and move the shell command out of the block? vlorentz: could we keep this block as json, and move the shell command out of the block? | |||||
Done Inline Actionswhy not, let me do that douardda: why not, let me do that | |||||
[ | |||||
{ | { | ||||
"visits": [ | "date": "2020-05-14T11:59:55.942964+00:00", | ||||
{ | "metadata": {}, | ||||
"date": "2019-06-03T09:28:10.223007+00:00", | "origin": "https://hal.archives-ouvertes.fr/hal-02560320", | ||||
"origin": 89283768, | "origin_visit_url": "https://archive.softwareheritage.org/api/1/origin/https://hal.archives-ouvertes.fr/hal-02560320/visit/2/", | ||||
"origin_visit_url": "/api/1/origin/89283768/visit/2/", | "snapshot": "e5e82d064a9c3df7464223042e0c55d72ccff7f0", | ||||
"snapshot": "a3773941561cc557853898773a19c07cfe2efc5a", | "snapshot_url": "https://archive.softwareheritage.org/api/1/snapshot/e5e82d064a9c3df7464223042e0c55d72ccff7f0/", | ||||
"snapshot_url": "/api/1/snapshot/a3773941561cc557853898773a19c07cfe2efc5a/", | |||||
"status": "full", | "status": "full", | ||||
"type": "deposit", | "type": "deposit", | ||||
"visit": 2 | "visit": 2 | ||||
}, | }, | ||||
{ | { | ||||
"date": "2019-05-27T12:23:31.037273+00:00", | "date": "2020-05-14T11:59:41.094260+00:00", | ||||
"origin": 89283768, | "metadata": {}, | ||||
"origin_visit_url": "/api/1/origin/89283768/visit/1/", | "origin": "https://hal.archives-ouvertes.fr/hal-02560320", | ||||
"snapshot": "43fdb8291f1bf6962211c370e394f6abb1cbe01d", | "origin_visit_url": "https://archive.softwareheritage.org/api/1/origin/https://hal.archives-ouvertes.fr/hal-02560320/visit/1/", | ||||
"snapshot_url": "/api/1/snapshot/43fdb8291f1bf6962211c370e394f6abb1cbe01d/", | "snapshot": "3e95ef6e04c381a34cc2f314576bc5644f2c797f", | ||||
"snapshot_url": "https://archive.softwareheritage.org/api/1/snapshot/3e95ef6e04c381a34cc2f314576bc5644f2c797f/", | |||||
"status": "full", | "status": "full", | ||||
"type": "deposit", | "type": "deposit", | ||||
"visit": 1 | "visit": 1 | ||||
} | } | ||||
] | ] | ||||
} | |||||
Snapshot artifact | Snapshot artifact | ||||
~~~~~~~~~~~~~~~~~ | ~~~~~~~~~~~~~~~~~ | ||||
The snapshot represents one deposit push. The ``HEAD`` branch points to a | The snapshot represents one deposit push. The ``HEAD`` branch points to a | ||||
synthetic revision. | synthetic revision. | ||||
For example: | |||||
.. code-block:: bash | |||||
$ http -pb https://archive.softwareheritage.org/api/1/snapshot/3e95ef6e04c381a34cc2f314576bc5644f2c797f/ | |||||
would result in: | |||||
.. code-block:: json | .. code-block:: json | ||||
Not Done Inline Actionssame vlorentz: same | |||||
{ | { | ||||
"snapshot": { | |||||
"branches": { | "branches": { | ||||
"HEAD": { | "HEAD": { | ||||
"target": "396b1ff29f7c75a0a3cc36f30e24ff7bae70bb52", | "target": "2122424b547a8eca9282ba3131ec61ff1d8df7d4", | ||||
"target_type": "revision", | "target_type": "revision", | ||||
"target_url": "/api/1/revision/396b1ff29f7c75a0a3cc36f30e24ff7bae70bb52/" | "target_url": "https://archive.softwareheritage.org/api/1/revision/2122424b547a8eca9282ba3131ec61ff1d8df7d4/" | ||||
} | } | ||||
}, | }, | ||||
"id": "a3773941561cc557853898773a19c07cfe2efc5a", | "id": "3e95ef6e04c381a34cc2f314576bc5644f2c797f", | ||||
"next_branch": null | "next_branch": null | ||||
} | } | ||||
} | |||||
Note that previous versions of the deposit-loader named the branch ``master`` | Note that previous versions of the deposit-loader named the branch ``master`` | ||||
instead, and created release branches under certain conditions. | instead, and created release branches under certain conditions. | ||||
Release artifact | Release artifact | ||||
~~~~~~~~~~~~~~~~ | ~~~~~~~~~~~~~~~~ | ||||
.. warning:: | .. warning:: | ||||
▲ Show 20 Lines • Show All 311 Lines • ▼ Show 20 Lines | |||||
~~~~~~~~~~~~~~~~ | ~~~~~~~~~~~~~~~~ | ||||
- the metadata received with the deposit are kept in a dedicated table | - the metadata received with the deposit are kept in a dedicated table | ||||
``raw_extrinsic_metadata``, distinct from the ``revision`` and ``origin`` | ``raw_extrinsic_metadata``, distinct from the ``revision`` and ``origin`` | ||||
tables. | tables. | ||||
- ``authority`` is computed from the deposit client information, and ``fetcher`` | - ``authority`` is computed from the deposit client information, and ``fetcher`` | ||||
is the deposit loader. | is the deposit loader. | ||||
this one too, for consistency