diff --git a/docs/getting-started.rst b/docs/getting-started.rst --- a/docs/getting-started.rst +++ b/docs/getting-started.rst @@ -175,7 +175,7 @@ The steps to create a multisteps deposit: 1. Create an incomplete deposit -~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ First use the ``--partial`` argument to declare there is more to come .. code:: shell @@ -186,7 +186,7 @@ 2. Add content or metadata to the deposit -~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Continue the deposit by using the ``--deposit-id`` argument given as a response for the first step. You can continue adding content or metadata while you use the ``--partial`` argument. @@ -235,7 +235,7 @@ ``--deposit-id `` is provided - by using the ``--replace`` flag - + - ``--metadata-deposit`` replaces associated existing metadata - ``--archive-deposit`` replaces associated archive(s) - by default, with no flag or both, you'll replace associated @@ -268,7 +268,7 @@ .. code:: shell -$ swh-deposit --username name --password secret --deposit-id '11' --status + $ swh-deposit --username name --password secret --deposit-id '11' --status .. code:: json diff --git a/docs/index.rst b/docs/index.rst --- a/docs/index.rst +++ b/docs/index.rst @@ -12,6 +12,7 @@ metadata.rst dev-info.rst sys-info.rst + specs/specs.rst Indices and tables ================== diff --git a/docs/blueprint.rst b/docs/specs/blueprint.rst rename from docs/blueprint.rst rename to docs/specs/blueprint.rst --- a/docs/blueprint.rst +++ b/docs/specs/blueprint.rst @@ -8,13 +8,13 @@ From client's deposit repository server to SWH's repository server: 1. The client requests for the server's abilities and its associated collection - (GET query to the *SD/service document uri*) + (GET query to the *SD/service document uri*) 2. The server answers the client with the service document which gives the - *collection uri* (also known as *COL/collection IRI*). + *collection uri* (also known as *COL/collection IRI*). 3. The client sends a deposit (optionally a zip archive, some metadata or both) - through the *collection uri*. + through the *collection uri*. This can be done in: @@ -22,16 +22,16 @@ * one POST request (metadata or archive) + other PUT or POST request to the *update uris* (*edit-media iri* or *edit iri*) - 1. Server validates the client's input or returns detailed error if any + a. Server validates the client's input or returns detailed error if any - 2. Server stores information received (metadata or software archive source + b. Server stores information received (metadata or software archive source code or both) 4. The server notifies the client it acknowledged the client's request. An - ``http 201 Created`` response with a deposit receipt in the body response is - sent back. That deposit receipt will hold the necessary information to - eventually complete the deposit later on if it was incomplete (also known as - status ``partial``). + ``http 201 Created`` response with a deposit receipt in the body response is + sent back. That deposit receipt will hold the necessary information to + eventually complete the deposit later on if it was incomplete (also known as + status ``partial``). Schema representation ^^^^^^^^^^^^^^^^^^^^^ diff --git a/docs/specs/metadata_example.xml b/docs/specs/metadata_example.xml new file mode 100644 --- /dev/null +++ b/docs/specs/metadata_example.xml @@ -0,0 +1,35 @@ + + + "{http://www.w3.org/2005/Atom}author": { + "{http://www.w3.org/2005/Atom}email": "hal@ccsd.cnrs.fr", + "{http://www.w3.org/2005/Atom}name": "HAL" + }, + + HAL + hal@ccsd.cnrs.fr + + hal + hal-01243573 + The assignment problem + https://hal.archives-ouvertes.fr/hal-01243573 + other identifier, DOI, ARK + Domain + description + + author1 + Inria + UPMC + + + author2 + Inria + UPMC + + + + + + + diff --git a/docs/spec-loading.rst b/docs/specs/spec-loading.rst rename from docs/spec-loading.rst rename to docs/specs/spec-loading.rst diff --git a/docs/specs/spec-meta-deposit.rst b/docs/specs/spec-meta-deposit.rst new file mode 100644 --- /dev/null +++ b/docs/specs/spec-meta-deposit.rst @@ -0,0 +1,100 @@ +The metadata-deposit +==================== + +Goal +---- +A client wishes to deposit only metadata about an object in the Software +Heritage archive. + +The meta-deposit is a special deposit where no content is +deposited and the data transfered to Software Heritage is only +the metadata about an object or several objects in the archive. + +The scope of the meta-deposit is different than the +sparse-deposit, while a sparse-deposit creates a revision with referenced +directories and content files, the meta-deposit references one of the following: + +- origin +- snapshot +- revision +- release + + +A complete metadata example +--------------------------- +The reference element is included in the metadata xml atomEntry under the +swh namespace: + +.. code:: xml + + + + + HAL + hal@ccsd.cnrs.fr + + hal + hal-01243573 + The assignment problem + https://hal.archives-ouvertes.fr/hal-01243573 + other identifier, DOI, ARK + Domain + description + + author1 + Inria + UPMC + + + author2 + Inria + UPMC + + + + + + + + +examples by target type +^^^^^^^^^^^^^^^^^^^^^^^ +snapshot +********* +.. code:: xml + + + + + + + +revision +******** +.. code:: xml + + + + + + + +release +******* +.. code:: xml + + + + + + + +Loading procedure +------------------ + +In this case, the meta-deposit will be injected as a metadata entry at the +appropriate level (origin_metadata, revision_metadata, etc.) and won't result +in the creation of a new object like with the complete deposit and the +sparse-deposit. diff --git a/docs/specs/spec-sparse-deposit.rst b/docs/specs/spec-sparse-deposit.rst new file mode 100644 --- /dev/null +++ b/docs/specs/spec-sparse-deposit.rst @@ -0,0 +1,104 @@ +The sparse-deposit +================== + +Goal +---- +A client wishes to transfer a tarball for which part of the content is +already in the SWH archive. + +Requirements +------------ +To do so, the paths to the missing directories/content must be provided as +empty paths in the tarball and the list linking each path to the object in the +archive will be provided as part of the metadata. The list will be refered to +as the manifest list. + ++----------------------+-------------------------------------+ +| path | swh-id | ++======================+=====================================+ +| path/to/file.txt | swh:1:cnt:aaaaaaaaaaaaaaaaaaaaa... | ++----------------------+-------------------------------------+ +| path/to/dir/ | swh:1:dir:aaaaaaaaaaaaaaaaaaaaa... | ++----------------------+-------------------------------------+ + +Note: the *name* of the file or the directory is given by the path and is not +part of the identified object. + +A concrete example +------------------ +The manifest list is included in the metadata xml atomEntry under the +swh namespace: + +.. code:: xml + + + + + HAL + hal@ccsd.cnrs.fr + + hal + hal-01243573 + The assignment problem + https://hal.archives-ouvertes.fr/hal-01243573 + other identifier, DOI, ARK + Domain + description + + author1 + Inria + UPMC + + + author2 + Inria + UPMC + + + + + + + + + + + +The tarball sent with the deposit will contain the following empty paths: +- path/to/file.txt +- path/to/second_file.txt +- path/to/dir/ + +Deposit verification +-------------------- + +After checking the integrity of the deposit content and +metadata, the following checks should be added: + +1. validate the manifest list structure with a swh-id for each path +2. verify that the paths in the manifest list are explicit and empty in the tarball +3. verify that the path name corresponds to the object type +4. locate the identifiers in the SWH archive + +Each one of the verifications should return a different error with the deposit +and result in a 'rejected' deposit. + +Loading procedure +------------------ +The injection procedure should include: + +- load the tarball data +- create new objects using the path name and create links from the path to the + SWH object using the identifier +- calculate identifier of the new objects at each level +- return final swh-id of the new revision + +Invariant: the same content should yield the same swhid, that's why a complete +deposit with all the content and a sparse-deposit with the correct links will +result with the same root directory swh-id and if the metadata are identical +also with the same revision swh-id. diff --git a/docs/specs/specs.rst b/docs/specs/specs.rst new file mode 100644 --- /dev/null +++ b/docs/specs/specs.rst @@ -0,0 +1,13 @@ +.. _swh-deposit-specs: + +Software Heritage Deposit Specifications +======================================== + +.. toctree:: + :maxdepth: 1 + :caption: Contents: + + blueprint.rst + spec-loading.rst + spec-sparse-deposit.rst + spec-meta-deposit.rst diff --git a/docs/specs/swh.xsd b/docs/specs/swh.xsd new file mode 100644 --- /dev/null +++ b/docs/specs/swh.xsd @@ -0,0 +1,41 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +