Changeset View
Changeset View
Standalone View
Standalone View
docs/specs/spec-loading.rst
Loading specification | Loading specification | ||||
===================== | ===================== | ||||
AN important part of the deposit specifications is the loading procedure whereas | An important part of the deposit specifications is the loading procedure where | ||||
a deposit is ingested into the Software Heritage (archive), using | a deposit is ingested into the Software Heritage (archive), using | ||||
the tarball loader and the complete schema of software artifacts creation | the tarball loader and the complete process of software artifacts creation | ||||
in the archive. | in the archive. | ||||
Tarball Loading | Tarball Loading | ||||
--------------- | --------------- | ||||
The ``swh-loader-tar`` module is already able to inject tarballs in swh | The ``swh-loader-tar`` module is already able to inject tarballs in swh | ||||
with very limited metadata (mainly the origin). | with very limited metadata (mainly the origin). | ||||
The loading of the deposit will use the deposit's associated data: | The loading of the deposit will use the deposit's associated data: | ||||
* the metadata | * the metadata | ||||
* the archive(s) | * the archive(s) | ||||
Artifacts creation | Artifacts creation | ||||
---------------------- | ------------------ | ||||
Deposit to artifacts mapping | Deposit to artifacts mapping | ||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
This is a global view of the deposit ingestion | This is a global view of the deposit ingestion | ||||
+------------------------------------+-----------------------------------------+ | +------------------------------------+-----------------------------------------+ | ||||
| swh artifact | representation in deposit | | | swh artifact | representation in deposit | | ||||
Show All 14 Lines | |||||
| | the expanded submitted tarball | | | | the expanded submitted tarball | | ||||
+------------------------------------+-----------------------------------------+ | +------------------------------------+-----------------------------------------+ | ||||
| directory | root directory of the expanded submitted| | | directory | root directory of the expanded submitted| | ||||
| | tarball | | | | tarball | | ||||
+------------------------------------+-----------------------------------------+ | +------------------------------------+-----------------------------------------+ | ||||
Origin artifact | Origin artifact | ||||
~~~~~~~~~~~~~~~~ | ~~~~~~~~~~~~~~~ | ||||
We create an origin using the url in the deposited metadata. | We create an origin using the url in the deposited metadata. | ||||
The current deposit and future deposits with the same url or external_id | The current deposit and future deposits with the same url or external_id | ||||
will be associated to this origin. | will be associated to this origin. | ||||
.. code-block:: json | .. code-block:: json | ||||
{ | { | ||||
"origin": { | "origin": { | ||||
"id": 89283768, | "id": 89283768, | ||||
"origin_visits_url": "/api/1/origin/89283768/visits/", | "origin_visits_url": "/api/1/origin/89283768/visits/", | ||||
"type": "deposit", | "type": "deposit", | ||||
"url": "https://hal.archives-ouvertes.fr/hal-02140606" | "url": "https://hal.archives-ouvertes.fr/hal-02140606" | ||||
} | } | ||||
} | } | ||||
Visits | Visits | ||||
~~~~~~~ | ~~~~~~ | ||||
We identify with a visit each deposit push of the same external_id. | We identify with a visit each deposit push of the same external_id. | ||||
Here in the example below, two snapshots are identified by two different visits. | Here in the example below, two snapshots are identified by two different visits. | ||||
.. code-block:: json | .. code-block:: json | ||||
{ | { | ||||
"visits": [ | "visits": [ | ||||
{ | { | ||||
Show All 16 Lines | .. code-block:: json | ||||
"type": "deposit", | "type": "deposit", | ||||
"visit": 1 | "visit": 1 | ||||
} | } | ||||
] | ] | ||||
} | } | ||||
Snapshot artifact | Snapshot artifact | ||||
~~~~~~~~~~~~~~~~~ | ~~~~~~~~~~~~~~~~~ | ||||
The snapshot represents one deposit push. The master branch points to a | The snapshot represents one deposit push. The master branch points to a | ||||
synthetic revision. We will create a second branch pointing to a release | synthetic revision. We will create a second branch pointing to a release | ||||
artifact, if the indicate that the deposit is a release with a `releaseNotes`. | artifact, if the indicate that the deposit is a release with a `releaseNotes`. | ||||
.. code-block:: json | .. code-block:: json | ||||
{ | { | ||||
"snapshot": { | "snapshot": { | ||||
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines | .. code-block:: json | ||||
"target_type": "revision", | "target_type": "revision", | ||||
"target_url": "/api/1/revision/396b1ff29f7c75a0a3cc36f30e24ff7bae70bb52/" | "target_url": "/api/1/revision/396b1ff29f7c75a0a3cc36f30e24ff7bae70bb52/" | ||||
} | } | ||||
} | } | ||||
Revision artifact | Revision artifact | ||||
~~~~~~~~~~~~~~~~~ | ~~~~~~~~~~~~~~~~~ | ||||
The metadata sent with the deposit is included in the revision which affects | The metadata sent with the deposit is included in the revision which affects | ||||
the hash computation, thus resulting in a unique identifier. | the hash computation, thus resulting in a unique identifier. | ||||
This way, by depositing the same content with different metadata, will result | This way, by depositing the same content with different metadata, will result | ||||
in two different revisions in the SWH archive. | in two different revisions in the SWH archive. | ||||
The date mapping | The date mapping | ||||
^^^^^^^^^^^^^^^^ | ^^^^^^^^^^^^^^^^ | ||||
A deposit may contain 4 different dates concerning the software artifacts. | A deposit may contain 4 different dates concerning the software artifacts. | ||||
The deposit's revision will reflect the most accurate point in time available. | The deposit's revision will reflect the most accurate point in time available. | ||||
Here are all dates that can be available in a deposit: | Here are all dates that can be available in a deposit: | ||||
+-------------------+-----------------------------------+-----------------------------------------------+ | +----------------+---------------------------------+------------------------------------------------+ | ||||
| dates | location | Description | | | dates | location | Description | | ||||
+===================+===================================+===============================================+ | +================+=================================+================================================+ | ||||
| reception_date | On SWORD reception (automatic) |the deposit was received at this ts | | | reception_date | On SWORD reception (automatic) | the deposit was received at this ts | | ||||
+-------------------+-----------------------------------+-----------------------------------------------+ | +----------------+---------------------------------+------------------------------------------------+ | ||||
| complete_date | On SWH ingestion (automatic) |the ingestion was completed by SWH at this ts | | | complete_date | On SWH ingestion (automatic) | the ingestion was completed by SWH at this ts | | ||||
+-------------------+-----------------------------------+-----------------------------------------------+ | +----------------+---------------------------------+------------------------------------------------+ | ||||
| dateCreated | metadata in codeMeta (optional) |the software artifact was created at this ts | | | dateCreated | metadata in codeMeta (optional) | the software artifact was created at this ts | | ||||
+-------------------+-----------------------------------+----------------------+------------------------+ | +----------------+---------------------------------+------------------------------------------------+ | ||||
| datePublished | metadata in codeMeta (optional) |the software was published (contributed in HAL)| | | datePublished | metadata in codeMeta (optional) | the software was published (contributed in HAL)| | ||||
+-------------------+-----------------------------------+----------------------+------------------------+ | +----------------+---------------------------------+------------------------------------------------+ | ||||
A visit targeting a snapshot contains one date: | A visit targeting a snapshot contains one date: | ||||
+-------------------+----------------------------------------------+----------------+ | +-------------------+----------------------------------------------+----------------+ | ||||
| SWH visit field | Description | value | | | SWH visit field | Description | value | | ||||
+===================+==============================================+================+ | +===================+==============================================+================+ | ||||
| date | the origin pushed the deposit at this date | reception_date | | | date | the origin pushed the deposit at this date | reception_date | | ||||
+-------------------+----------------------------------------------+----------------+ | +-------------------+----------------------------------------------+----------------+ | ||||
A revision contains two dates: | A revision contains two dates: | ||||
+-------------------+-----------------------------------------+----------------+----------------+ | +-------------------+-----------------------------------------+----------------+----------------+ | ||||
| SWH revision field| Description | CodeMeta term | Fallback value | | | SWH revision field| Description | CodeMeta term | Fallback value | | ||||
+===================+=========================================+================+================+ | +===================+=========================================+================+================+ | ||||
| date | date of software artifact modification | dateCreated | reception_date | | | date | date of software artifact modification | dateCreated | reception_date | | ||||
+-------------------+-----------------------------------------+----------------+----------------+ | +-------------------+-----------------------------------------+----------------+----------------+ | ||||
| committer_date | date of the commit in VCS | datePublished | reception_date | | | committer_date | date of the commit in VCS | datePublished | reception_date | | ||||
+-------------------+-----------------------------------------+----------------+----------------+ | +-------------------+-----------------------------------------+----------------+----------------+ | ||||
A release contains one date: | A release contains one date: | ||||
+-------------------+----------------------------------+---------------+----------------+ | +-------------------+----------------------------------+----------------+-----------------+ | ||||
| SWH release field |Description |CodeMeta term | Fallback value | | | SWH release field |Description | CodeMeta term | Fallback value | | ||||
+===================+==================================+===============+================+ | +===================+==================================+================+=================+ | ||||
| date |release date = publication date |datePublished |reception_date | | | date |release date = publication date | datePublished | reception_date | | ||||
+-------------------+----------------------------------+---------------+----------------+ | +-------------------+----------------------------------+----------------+-----------------+ | ||||
.. code-block:: json | .. code-block:: json | ||||
{ | { | ||||
"revision": { | "revision": { | ||||
"author": { | "author": { | ||||
"email": "robot@softwareheritage.org", | "email": "robot@softwareheritage.org", | ||||
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines | .. code-block:: json | ||||
"synthetic": true, | "synthetic": true, | ||||
"type": "tar", | "type": "tar", | ||||
"url": "/api/1/revision/396b1ff29f7c75a0a3cc36f30e24ff7bae70bb52/" | "url": "/api/1/revision/396b1ff29f7c75a0a3cc36f30e24ff7bae70bb52/" | ||||
} | } | ||||
} | } | ||||
Directory artifact | Directory artifact | ||||
~~~~~~~~~~~~~~~~~~ | ~~~~~~~~~~~~~~~~~~ | ||||
The directory artifact is the archive(s)' raw content deposited. | The directory artifact is the archive(s)' raw content deposited. | ||||
.. code-block:: json | .. code-block:: json | ||||
{ | { | ||||
"directory": [ | "directory": [ | ||||
{ | { | ||||
"dir_id": "fb13b51abbcfd13de85d9ba8d070a23679576cd7", | "dir_id": "fb13b51abbcfd13de85d9ba8d070a23679576cd7", | ||||
▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines |