diff --git a/docs/getting-started.rst b/docs/getting-started.rst --- a/docs/getting-started.rst +++ b/docs/getting-started.rst @@ -175,7 +175,7 @@ The steps to create a multisteps deposit: 1. Create an incomplete deposit -~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ First use the ``--partial`` argument to declare there is more to come .. code:: shell @@ -186,7 +186,7 @@ 2. Add content or metadata to the deposit -~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Continue the deposit by using the ``--deposit-id`` argument given as a response for the first step. You can continue adding content or metadata while you use the ``--partial`` argument. @@ -235,7 +235,7 @@ ``--deposit-id `` is provided - by using the ``--replace`` flag - + - ``--metadata-deposit`` replaces associated existing metadata - ``--archive-deposit`` replaces associated archive(s) - by default, with no flag or both, you'll replace associated @@ -268,7 +268,7 @@ .. code:: shell -$ swh-deposit --username name --password secret --deposit-id '11' --status + $ swh-deposit --username name --password secret --deposit-id '11' --status .. code:: json diff --git a/docs/index.rst b/docs/index.rst --- a/docs/index.rst +++ b/docs/index.rst @@ -12,6 +12,7 @@ metadata.rst dev-info.rst sys-info.rst + specs/specs.rst Indices and tables ================== diff --git a/docs/blueprint.rst b/docs/specs/blueprint.rst rename from docs/blueprint.rst rename to docs/specs/blueprint.rst --- a/docs/blueprint.rst +++ b/docs/specs/blueprint.rst @@ -8,13 +8,13 @@ From client's deposit repository server to SWH's repository server: 1. The client requests for the server's abilities and its associated collection - (GET query to the *SD/service document uri*) + (GET query to the *SD/service document uri*) 2. The server answers the client with the service document which gives the - *collection uri* (also known as *COL/collection IRI*). + *collection uri* (also known as *COL/collection IRI*). 3. The client sends a deposit (optionally a zip archive, some metadata or both) - through the *collection uri*. + through the *collection uri*. This can be done in: @@ -22,16 +22,16 @@ * one POST request (metadata or archive) + other PUT or POST request to the *update uris* (*edit-media iri* or *edit iri*) - 1. Server validates the client's input or returns detailed error if any + a. Server validates the client's input or returns detailed error if any - 2. Server stores information received (metadata or software archive source + b. Server stores information received (metadata or software archive source code or both) 4. The server notifies the client it acknowledged the client's request. An - ``http 201 Created`` response with a deposit receipt in the body response is - sent back. That deposit receipt will hold the necessary information to - eventually complete the deposit later on if it was incomplete (also known as - status ``partial``). + ``http 201 Created`` response with a deposit receipt in the body response is + sent back. That deposit receipt will hold the necessary information to + eventually complete the deposit later on if it was incomplete (also known as + status ``partial``). Schema representation ^^^^^^^^^^^^^^^^^^^^^ diff --git a/docs/specs/metadata_example.xml b/docs/specs/metadata_example.xml new file mode 100644 --- /dev/null +++ b/docs/specs/metadata_example.xml @@ -0,0 +1,35 @@ + + + "{http://www.w3.org/2005/Atom}author": { + "{http://www.w3.org/2005/Atom}email": "hal@ccsd.cnrs.fr", + "{http://www.w3.org/2005/Atom}name": "HAL" + }, + + HAL + hal@ccsd.cnrs.fr + + hal + hal-01243573 + The assignment problem + https://hal.archives-ouvertes.fr/hal-01243573 + other identifier, DOI, ARK + Domain + description + + author1 + Inria + UPMC + + + author2 + Inria + UPMC + + + + + + + diff --git a/docs/spec-loading.rst b/docs/specs/spec-loading.rst rename from docs/spec-loading.rst rename to docs/specs/spec-loading.rst diff --git a/docs/specs/spec-meta-deposit.rst b/docs/specs/spec-meta-deposit.rst new file mode 100644 --- /dev/null +++ b/docs/specs/spec-meta-deposit.rst @@ -0,0 +1,84 @@ +The metadata-deposit +==================== + +Goal +---- +A client wishes to deposit only metadata about an object in the Software +Heritage archive. + +The meta-deposit is a special deposit where no content is +provided and the data transfered to Software Heritage is only +the metadata about an object or several objects in the archive. + +Requirements +------------ +The scope of the meta-deposit is different than the +sparse-deposit. While a sparse-deposit creates a revision with referenced +directories and content files, the meta-deposit references one of the following: + +- origin +- snapshot +- revision +- release + + +A complete metadata example +--------------------------- +The reference element is included in the metadata xml atomEntry under the +swh namespace: + +.. code:: xml + + + + + HAL + hal@ccsd.cnrs.fr + + hal + hal-01243573 + The assignment problem + https://hal.archives-ouvertes.fr/hal-01243573 + other identifier, DOI, ARK + Domain + description + + author1 + Inria + UPMC + + + author2 + Inria + UPMC + + + + + + + + +Examples by target type +^^^^^^^^^^^^^^^^^^^^^^^ + +With ${type} in {snp (snapshot), rev (revision), rel (release) }: + +.. code:: xml + + + + + + + + + +Loading procedure +------------------ + +In this case, the meta-deposit will be injected as a metadata entry at the +appropriate level (origin_metadata, revision_metadata, etc.). Contrary to the +complete and sparse deposit, there will be no object creation. diff --git a/docs/specs/spec-sparse-deposit.rst b/docs/specs/spec-sparse-deposit.rst new file mode 100644 --- /dev/null +++ b/docs/specs/spec-sparse-deposit.rst @@ -0,0 +1,101 @@ +The sparse-deposit +================== + +Goal +---- +A client wishes to transfer a tarball for which part of the content is +already in the SWH archive. + +Requirements +------------ +To do so, a list of paths with targets must be provided in the metadata and +the paths to the missing directories/content should not be included +in the tarball. The list will be referred to +as the manifest list using the entry name 'bindings' in the metadata. + ++----------------------+-------------------------------------+ +| path | swh-id | ++======================+=====================================+ +| path/to/file.txt | swh:1:cnt:aaaaaaaaaaaaaaaaaaaaa... | ++----------------------+-------------------------------------+ +| path/to/dir/ | swh:1:dir:aaaaaaaaaaaaaaaaaaaaa... | ++----------------------+-------------------------------------+ + +Note: the *name* of the file or the directory is given by the path and is not +part of the identified object. + +A concrete example +------------------ +The manifest list is included in the metadata xml atomEntry under the +swh namespace: + +.. code:: xml + + + + + HAL + hal@ccsd.cnrs.fr + + hal + hal-01243573 + The assignment problem + https://hal.archives-ouvertes.fr/hal-01243573 + other identifier, DOI, ARK + Domain + description + + author1 + Inria + UPMC + + + author2 + Inria + UPMC + + + + + + + + + + + + +Deposit verification +-------------------- + +After checking the integrity of the deposit content and +metadata, the following checks should be added: + +1. validate the manifest list structure with a correct swh-id for each path (syntax check on the swh-id format) +2. verify that the path name corresponds to the object type +3. locate the identifiers in the SWH archive + +Each failing check should return a different error with the deposit +and result in a 'rejected' deposit. + +Loading procedure +------------------ +The injection procedure should include: + +- load the tarball new data +- create new objects using the path name and create links from the path to the + SWH object using the identifier +- calculate identifier of the new objects at each level +- return final swh-id of the new revision + +Invariant: the same content should yield the same swh-id, +that's why a complete deposit with all the content and +a sparse-deposit with the correct links will result +with the same root directory swh-id. +The same is expected with the revision swh-id if the metadata provided is +identical. diff --git a/docs/specs/specs.rst b/docs/specs/specs.rst new file mode 100644 --- /dev/null +++ b/docs/specs/specs.rst @@ -0,0 +1,13 @@ +.. _swh-deposit-specs: + +Blueprint Specifications +========================= + +.. toctree:: + :maxdepth: 1 + :caption: Contents: + + blueprint.rst + spec-loading.rst + spec-sparse-deposit.rst + spec-meta-deposit.rst diff --git a/docs/specs/swh.xsd b/docs/specs/swh.xsd new file mode 100644 --- /dev/null +++ b/docs/specs/swh.xsd @@ -0,0 +1,41 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +