diff --git a/docs/getting-started.rst b/docs/getting-started.rst index 399658c0..59d7f858 100644 --- a/docs/getting-started.rst +++ b/docs/getting-started.rst @@ -1,309 +1,309 @@ Getting Started =============== This is a guide for how to prepare and push a software deposit with the swh-deposit commands. The api is rooted at https://deposit.softwareheritage.org/1. For more details, see the `main documentation <./index.html>`__. Requirements ------------ You need to be referenced on SWH's client list to have: * credentials (needed for the basic authentication step) - in this document we reference ```` as the client's name and ```` as its associated authentication password. * an associated collection `Contact us for more information. `__ Prepare a deposit ----------------- * compress the files in a supported archive format: - zip: common zip archive (no multi-disk zip files). - tar: tar archive without compression or optionally any of the following compression algorithm gzip (.tar.gz, .tgz), bzip2 (.tar.bz2) , or lzma (.tar.lzma) * prepare a metadata file (`more details <./metadata.html>`__.): - specify metadata schema/vocabulary (CodeMeta is recommended) - specify *MUST* metadata (url, authors, software name and the external\_identifier) - add all available information under the compatible metadata term An example of an atom entry file with CodeMeta terms: .. code:: xml Je suis GPL swh je-suis-gpl https://forge.softwareheritage.org/source/jesuisgpl/ 2018-01-05 Je suis GPL is a modified version of GNU Hello whose sole purpose is to showcase the usage of Software Heritage for license compliance purposes. 0.1 GNU/Linux stable C GNU General Public License v3.0 or later https://spdx.org/licenses/GPL-3.0-or-later.html Stefano Zacchiroli Maintainer Push deposit ------------ You can push a deposit with: * a single deposit (archive + metadata): The user posts in one query a software source code archive and associated metadata. The deposit is directly marked with status ``deposited``. * a multisteps deposit: 1. Create an incomplete deposit (marked with status ``partial``) 2. Add data to a deposit (in multiple requests if needed) 3. Finalize deposit (the status becomes ``deposited``) Single deposit ^^^^^^^^^^^^^^ Once the files are ready for deposit, we want to do the actual deposit in one shot, sending exactly one POST query: * 1 archive (content-type ``application/zip`` or ``application/x-tar``) * 1 metadata file in atom xml format (``content-type: application/atom+xml;type=entry``) For this, we need to provide the: * arguments: ``--username 'name' --password 'pass'`` as credentials * archive's path (example: ``--archive path/to/archive-name.tgz``) : * (optionally) metadata file's path ``--metadata path/to/file.metadata.xml``. If not provided, the archive's filename will be used to determine the metadata file, e.g: ``path/to/archive-name.tgz.metadata.xml`` * (optionally) ``--slug 'your-id'`` argument, a reference to a unique identifier the client uses for the software object. You can do this with the following command: minimal deposit .. code:: shell $ swh-deposit ---username name --password secret \ --archive je-suis-gpl.tgz with client's external identifier (``slug``) .. code:: shell $ swh-deposit --username name --password secret \ --archive je-suis-gpl.tgz \ --slug je-suis-gpl to a specific client's collection .. code:: shell $ swh-deposit --username name --password secret \ --archive je-suis-gpl.tgz \ --collection 'second-collection' You just posted a deposit to your collection on Software Heritage If everything went well, the successful response will contain the elements below: .. code:: shell { 'deposit_status': 'deposited', 'deposit_id': '7', 'deposit_date': 'Jan. 29, 2018, 12:29 p.m.' } Note: As the deposit is in ``deposited`` status, you can no longer update the deposit after this query. It will be answered with a 403 forbidden answer. If something went wrong, an equivalent response will be given with the `error` and `detail` keys explaining the issue, e.g.: .. code:: shell { 'error': 'Unknown collection name xyz', 'detail': None, 'deposit_status': None, 'deposit_status_detail': None, 'deposit_swh_id': None, 'status': 404 } multisteps deposit ^^^^^^^^^^^^^^^^^^^^^^^^^ The steps to create a multisteps deposit: 1. Create an incomplete deposit -~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ First use the ``--partial`` argument to declare there is more to come .. code:: shell $ swh-deposit --username name --password secret \ --archive foo.tar.gz \ --partial 2. Add content or metadata to the deposit -~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Continue the deposit by using the ``--deposit-id`` argument given as a response for the first step. You can continue adding content or metadata while you use the ``--partial`` argument. .. code:: shell $ swh-deposit --username name --password secret \ --archive add-foo.tar.gz \ --deposit-id 42 \ --partial In case you want to add only one new archive without metadata: .. code:: shell $ swh-deposit --username name --password secret \ --archive add-foo.tar.gz \ --archive-deposit \ --deposit-id 42 \ --partial \ If you want to add only metadata, use: .. code:: shell $ swh-deposit --username name --password secret \ --metadata add-foo.tar.gz.metadata.xml \ --metadata-deposit \ --deposit-id 42 \ --partial 3. Finalize deposit ~~~~~~~~~~~~~~~~~~~ On your last addition, by not declaring it as ``--partial``, the deposit will be considered as completed and its status will be changed to ``deposited``. Update deposit ---------------- * replace deposit: - only possible if the deposit status is ``partial`` and ``--deposit-id `` is provided - by using the ``--replace`` flag - ``--metadata-deposit`` replaces associated existing metadata - ``--archive-deposit`` replaces associated archive(s) - by default, with no flag or both, you'll replace associated metadata and archive(s) .. code:: shell $ swh-deposit --username name --password secret \ --deposit-id 11 \ --archive updated-je-suis-gpl.tgz \ --replace * update a loaded deposit with a new version: - by using the external-id with the ``--slug`` argument, you will link the new deposit with its parent deposit .. code:: shell $ swh-deposit --username name --password secret \ --archive je-suis-gpl-v2.tgz \ --slug 'je-suis-gpl' \ Check the deposit's status -------------------------- You can check the status of the deposit by using the ``--deposit-id`` argument: .. code:: shell -$ swh-deposit --username name --password secret --deposit-id '11' --status + $ swh-deposit --username name --password secret --deposit-id '11' --status .. code:: json { 'deposit_id': '11', 'deposit_status': 'deposited', 'deposit_swh_id': None, 'deposit_status_detail': 'Deposit is ready for additional checks \ (tarball ok, metadata, etc...)' } The different statuses: - **partial**: multipart deposit is still ongoing - **deposited**: deposit completed - **rejected**: deposit failed the checks - **verified**: content and metadata verified - **loading**: loading in-progress - **done**: loading completed successfully - **failed**: the deposit loading has failed When the deposit has been loaded into the archive, the status will be marked ``done``. In the response, will also be available the , , , . For example: .. code:: json { 'deposit_id': '11', 'deposit_status': 'done', 'deposit_swh_id': 'swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9', 'deposit_swh_id_context': 'swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9;origin=https://forge.softwareheritage.org/source/jesuisgpl/', 'deposit_swh_anchor_id': 'swh:1:rev:e76ea49c9ffbb7f73611087ba6e999b19e5d71eb', 'deposit_swh_anchor_id_context': 'swh:1:rev:e76ea49c9ffbb7f73611087ba6e999b19e5d71eb;origin=https://forge.softwareheritage.org/source/jesuisgpl/', 'deposit_status_detail': 'The deposit has been successfully \ loaded into the Software Heritage archive' } diff --git a/docs/index.rst b/docs/index.rst index 23e304b5..e8ffe3ef 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -1,21 +1,22 @@ .. _swh-deposit: Software Heritage Deposit ========================= .. toctree:: :maxdepth: 1 :caption: Contents: getting-started.rst spec-api.rst metadata.rst dev-info.rst sys-info.rst + specs/specs.rst Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search` diff --git a/docs/blueprint.rst b/docs/specs/blueprint.rst similarity index 84% rename from docs/blueprint.rst rename to docs/specs/blueprint.rst index 1fa91cd9..e0b93e8f 100644 --- a/docs/blueprint.rst +++ b/docs/specs/blueprint.rst @@ -1,114 +1,114 @@ Use cases --------- Deposit creation ~~~~~~~~~~~~~~~~ From client's deposit repository server to SWH's repository server: 1. The client requests for the server's abilities and its associated collection - (GET query to the *SD/service document uri*) + (GET query to the *SD/service document uri*) 2. The server answers the client with the service document which gives the - *collection uri* (also known as *COL/collection IRI*). + *collection uri* (also known as *COL/collection IRI*). 3. The client sends a deposit (optionally a zip archive, some metadata or both) - through the *collection uri*. + through the *collection uri*. This can be done in: * one POST request (metadata + archive). * one POST request (metadata or archive) + other PUT or POST request to the *update uris* (*edit-media iri* or *edit iri*) - 1. Server validates the client's input or returns detailed error if any + a. Server validates the client's input or returns detailed error if any - 2. Server stores information received (metadata or software archive source + b. Server stores information received (metadata or software archive source code or both) 4. The server notifies the client it acknowledged the client's request. An - ``http 201 Created`` response with a deposit receipt in the body response is - sent back. That deposit receipt will hold the necessary information to - eventually complete the deposit later on if it was incomplete (also known as - status ``partial``). + ``http 201 Created`` response with a deposit receipt in the body response is + sent back. That deposit receipt will hold the necessary information to + eventually complete the deposit later on if it was incomplete (also known as + status ``partial``). Schema representation ^^^^^^^^^^^^^^^^^^^^^ .. raw:: html .. figure:: /images/deposit-create-chart.png :alt: Updating an existing deposit ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 5. Client updates existing deposit through the *update uris* (one or more POST or PUT requests to either the *edit-media iri* or *edit iri*). 1. Server validates the client's input or returns detailed error if any 2. Server stores information received (metadata or software archive source code or both) This would be the case for example if the client initially posted a ``partial`` deposit (e.g. only metadata with no archive, or an archive without metadata, or a splitted archive because the initial one exceeded the limit size imposed by swh repository deposit) Schema representation ^^^^^^^^^^^^^^^^^^^^^ .. raw:: html .. figure:: /images/deposit-update-chart.png :alt: Deleting deposit (or associated archive, or associated metadata) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 6. Deposit deletion is possible as long as the deposit is still in ``partial`` state. 1. Server validates the client's input or returns detailed error if any 2. Server actually delete information according to request Schema representation ^^^^^^^^^^^^^^^^^^^^^ .. raw:: html .. figure:: /images/deposit-delete-chart.png :alt: Client asks for operation status ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 7. Operation status can be read through a GET query to the *state iri*. Server: Triggering deposit checks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Once the status ``deposited`` is reached for a deposit, checks for the associated archive(s) and metadata will be triggered. If those checks fail, the status is changed to ``rejected`` and nothing more happens there. Otherwise, the status is changed to ``verified``. Server: Triggering deposit load ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Once the status ``verified`` is reached for a deposit, loading the deposit with its associated metadata will be triggered. The loading will result on status update, either ``done`` or ``failed`` (depending on the loading's status). This is described in the `loading document <./spec-loading.html>`__. diff --git a/docs/specs/metadata_example.xml b/docs/specs/metadata_example.xml new file mode 100644 index 00000000..59c5ed82 --- /dev/null +++ b/docs/specs/metadata_example.xml @@ -0,0 +1,38 @@ + + + "{http://www.w3.org/2005/Atom}author": { + "{http://www.w3.org/2005/Atom}email": "hal@ccsd.cnrs.fr", + "{http://www.w3.org/2005/Atom}name": "HAL" + }, + + HAL + hal@ccsd.cnrs.fr + + hal + hal-01243573 + The assignment problem + https://hal.archives-ouvertes.fr/hal-01243573 + other identifier, DOI, ARK + Domain + description + + author1 + Inria + UPMC + + + author2 + Inria + UPMC + + + + + ./path/to/file.txt + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + + + + diff --git a/docs/spec-loading.rst b/docs/specs/spec-loading.rst similarity index 100% rename from docs/spec-loading.rst rename to docs/specs/spec-loading.rst diff --git a/docs/specs/spec-meta-deposit.rst b/docs/specs/spec-meta-deposit.rst new file mode 100644 index 00000000..2d682449 --- /dev/null +++ b/docs/specs/spec-meta-deposit.rst @@ -0,0 +1,31 @@ +The meta-deposit +================ + +Goal +---- +A client wishes to deposit only metadata about an object in the Software +Heritage archive. + +The meta-deposit is a special deposit where no content is +deposited and the data transfered to Software Heritage is only +the metadata about an object or several objects in the archive. + +The scope of the meta-deposit is larger than the sparse-deposit, because +with a meta-deposit all types of objects in the archive can be described +with the deposited metadata: + +- origin +- snapshot +- revision +- release +- directory +- content + + +Loading procedure +------------------ + +In this case, the meta-deposit will be injected as a metadata entry at the +appropriate level (origin_metadata, revision_metadata, etc.) and won't result +in the creation of a new object like with the complete deposit and the +sparse-deposit. diff --git a/docs/specs/spec-sparse-deposit.rst b/docs/specs/spec-sparse-deposit.rst new file mode 100644 index 00000000..534957a8 --- /dev/null +++ b/docs/specs/spec-sparse-deposit.rst @@ -0,0 +1,109 @@ +The sparse-deposit +================== + +Goal +---- +A client wishes to transfer a tarball for which part of the content is +already in the SWH archive. + +Requirements +------------ +To do so, the paths to the missing directories/content must be provided as +empty paths in the tarball and the list linking each path to the object in the +archive will be provided as part of the metadata. The list will be refered to +as the manifest list. + ++----------------------+-------------------------------------+ +| path | swh-id | ++======================+=====================================+ +| ./path/to/file.txt | swh:1:cnt:aaaaaaaaaaaaaaaaaaaaa... | ++----------------------+-------------------------------------+ +| ./path/to/dir/ | swh:1:dir:aaaaaaaaaaaaaaaaaaaaa... | ++----------------------+-------------------------------------+ + +Note: the *name* of the file or the directory is given by the path and is not +part of the identified object. + +A concrete example +------------------ +The manifest list is included in the metadata xml atomEntry under the +swh namespace: + +.. code:: xml + + + + + HAL + hal@ccsd.cnrs.fr + + hal + hal-01243573 + The assignment problem + https://hal.archives-ouvertes.fr/hal-01243573 + other identifier, DOI, ARK + Domain + description + + author1 + Inria + UPMC + + + author2 + Inria + UPMC + + + + + ./path/to/file.txt + swh:1:cnt:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + + + ./path/to/second_file.txt + swh:1:cnt:bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb + + + ./path/to/dir/ + swh:1:dir:ddddddddddddddddddddddddddddddddd + + + + + +The tarball sent with the deposit will contain the following empty paths: +- path/to/file.txt +- path/to/second_file.txt +- path/to/dir/ + +Deposit verification +-------------------- + +After checking the integrity of the deposit content and +metadata, the following checks should be added: + +1. validate the manifest list structure with a swh-id for each path +2. verify that the paths in the manifest list are explicit and empty in the tarball +3. verify that the path name corresponds to the object type +4. locate the identifiers in the SWH archive + +Each one of the verifications should return a different error with the deposit +and result in a 'rejected' deposit. + +Loading procedure +------------------ +The injection procedure should include: + +- load the tarball data +- create new objects using the path name and create links from the path to the + SWH object using the identifier +- calculate identifier of the new objects at each level +- return final swh-id of the new revision + +Invariant: the same content should yield the same swhid, that's why a complete +deposit with all the content and a sparse-deposit with the correct links will +result with the same root directory swh-id and if the metadata are identical +also with the same revision swh-id. diff --git a/docs/specs/specs.rst b/docs/specs/specs.rst new file mode 100644 index 00000000..608183c4 --- /dev/null +++ b/docs/specs/specs.rst @@ -0,0 +1,13 @@ +.. _swh-deposit-specs: + +Software Heritage Deposit Specifications +======================================== + +.. toctree:: + :maxdepth: 1 + :caption: Contents: + + blueprint.rst + spec-loading.rst + spec-sparse-deposit.rst + spec-meta-deposit.rst