diff --git a/docs/specs/index.rst b/docs/specs/index.rst index 8abdb491..9fa17d63 100644 --- a/docs/specs/index.rst +++ b/docs/specs/index.rst @@ -1,14 +1,13 @@ .. _swh-deposit-specs: Blueprint Specifications ========================= .. toctree:: :maxdepth: 1 :caption: Contents: blueprint.rst spec-loading.rst spec-technical.rst - spec-sparse-deposit.rst spec-meta-deposit.rst diff --git a/docs/specs/metadata_example.xml b/docs/specs/metadata_example.xml index 5e9ae9bc..a184c53f 100644 --- a/docs/specs/metadata_example.xml +++ b/docs/specs/metadata_example.xml @@ -1,33 +1,30 @@ HAL hal@ccsd.cnrs.fr hal The assignment problem https://hal.archives-ouvertes.fr/hal-01243573 other identifier, DOI, ARK Domain description author1 Inria UPMC author2 Inria UPMC - - - diff --git a/docs/specs/spec-meta-deposit.rst b/docs/specs/spec-meta-deposit.rst index 8f1cbe4c..342da790 100644 --- a/docs/specs/spec-meta-deposit.rst +++ b/docs/specs/spec-meta-deposit.rst @@ -1,134 +1,133 @@ The metadata-deposit ^^^^^^^^^^^^^^^^^^^^ Goal ==== A client wishes to deposit only metadata about an origin or object already present in the Software Heritage archive. The metadata-deposit is a special deposit where no content is provided and the data transferred to Software Heritage is only the metadata about an object in the archive. Requirements ============ 1. Create a metadata-only deposit through a :ref:`POST request` 2. It is composed of ONLY one xml metadata file 3. It MUST comply with :ref:`the metadata requirements` 4. It MUST reference an **object** or an **origin** in a deposit tag 5. The reference SHOULD exist in the SWH archive 6. The **object** reference MUST be a SWHID on one of the following artifact types: - origin - snapshot - release - revision - directory - content 7. The SWHID MAY be a `core identifier`_ with or without `qualifiers`_ 8. The SWHID MUST NOT reference a fragment of code with the classifier `lines` .. _core identifier: https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#core-identifiers .. _qualifiers: https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#qualifiers A complete metadata example =========================== The reference element is included in the metadata xml atomEntry under the swh namespace: .. code:: xml HAL hal@ccsd.cnrs.fr The assignment problem https://hal.archives-ouvertes.fr/hal-01243573 other identifier, DOI, ARK Domain description author1 Inria UPMC author2 Inria UPMC References ========== The metadata reference can be either on: - an origin - a graph object (core SWHID with or without qualifiers) Origins ------- The metadata may be on an origin, identified by the origin's URL: .. code:: xml Graph objects ------------- It may also reference an object in the `SWH graph `: contents, directories, revisions, releases, and snapshots: .. code:: xml .. code:: xml The value of the ``swhid`` attribute must be a `SWHID `, with any context qualifiers in this list: * ``origin`` * ``visit`` * ``anchor`` * ``path`` and they should be provided whenever relevant, especially ``origin``. Other qualifiers are not allowed (for example, ``line`` isn't because SWH cannot store metadata at a finer level than entire contents). Loading procedure ================= In this case, the metadata-deposit will be injected as a metadata entry of the relevant object, with the information about the contributor of the deposit. -Contrary to the complete and sparse deposit, there will be no object creation. diff --git a/docs/specs/spec-sparse-deposit.rst b/docs/specs/spec-sparse-deposit.rst deleted file mode 100644 index c8fbcc8a..00000000 --- a/docs/specs/spec-sparse-deposit.rst +++ /dev/null @@ -1,100 +0,0 @@ -The sparse-deposit -================== - -Goal ----- -A client wishes to transfer a tarball for which part of the content is -already in the SWH archive. - -Requirements ------------- -To do so, a list of paths with targets must be provided in the metadata and -the paths to the missing directories/content should not be included -in the tarball. The list will be referred to -as the manifest list using the entry name 'bindings' in the metadata. - -+----------------------+-------------------------------------+ -| path | swh-id | -+======================+=====================================+ -| path/to/file.txt | swh:1:cnt:aaaaaaaaaaaaaaaaaaaaa... | -+----------------------+-------------------------------------+ -| path/to/dir/ | swh:1:dir:aaaaaaaaaaaaaaaaaaaaa... | -+----------------------+-------------------------------------+ - -Note: the *name* of the file or the directory is given by the path and is not -part of the identified object. - -TODO: see if a trailing "/" is mandatory for implementation. - -A concrete example ------------------- -The manifest list is included in the metadata xml atomEntry under the -swh namespace: - -TODO: publish schema at https://www.softwareheritage.org/schema/2018/deposit - -.. code:: xml - - - - - HALit mandatory to have a trailing "/", - hal@ccsd.cnrs.fr - - The assignment problem - https://hal.archives-ouvertes.fr/hal-01243573 - other identifier, DOI, ARK - Domain - description - - author1 - Inria - UPMC - - - author2 - Inria - UPMC - - - - - - - - - - - - -Deposit verification --------------------- - -After checking the integrity of the deposit content and -metadata, the following checks should be added: - -1. validate the manifest list structure with a correct swh-id for each path (syntax check on the swh-id format) -2. verify that the path name corresponds to the object type -3. locate the identifiers in the SWH archive - -Each failing check should return a different error with the deposit -and result in a 'rejected' deposit. - -Loading procedure ------------------- -The injection procedure should include: - -- load the tarball new data -- create new objects using the path name and create links from the path to the - SWH object using the identifier -- calculate identifier of the new objects at each level -- return final swh-id of the new revision - -Invariant: the same content should yield the same swh-id, -that's why a complete deposit with all the content and -a sparse-deposit with the correct links will result -with the same root directory swh-id. -The same is expected with the revision swh-id if the metadata provided is -identical. diff --git a/docs/specs/swh.xsd b/docs/specs/swh.xsd index 119d83d1..2a7d4292 100644 --- a/docs/specs/swh.xsd +++ b/docs/specs/swh.xsd @@ -1,65 +1,51 @@ - - - - - - - - - - - - - -