diff --git a/docs/specs/metadata_example.xml b/docs/specs/metadata_example.xml index 59c5ed82..c681e559 100644 --- a/docs/specs/metadata_example.xml +++ b/docs/specs/metadata_example.xml @@ -1,38 +1,35 @@ "{http://www.w3.org/2005/Atom}author": { "{http://www.w3.org/2005/Atom}email": "hal@ccsd.cnrs.fr", "{http://www.w3.org/2005/Atom}name": "HAL" }, HAL hal@ccsd.cnrs.fr hal hal-01243573 The assignment problem https://hal.archives-ouvertes.fr/hal-01243573 other identifier, DOI, ARK Domain description author1 Inria UPMC author2 Inria UPMC - - - ./path/to/file.txt - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa - - + + + diff --git a/docs/specs/spec-meta-deposit.rst b/docs/specs/spec-meta-deposit.rst index 9ba0c8ef..a4f700c8 100644 --- a/docs/specs/spec-meta-deposit.rst +++ b/docs/specs/spec-meta-deposit.rst @@ -1,104 +1,100 @@ -The meta-deposit -================ +The metadata-deposit +==================== Goal ---- A client wishes to deposit only metadata about an object in the Software Heritage archive. The meta-deposit is a special deposit where no content is deposited and the data transfered to Software Heritage is only the metadata about an object or several objects in the archive. The scope of the meta-deposit is different than the sparse-deposit, while a sparse-deposit creates a revision with referenced directories and content files, the meta-deposit references one of the following: - origin - snapshot - revision - release A complete metadata example --------------------------- The reference element is included in the metadata xml atomEntry under the swh namespace: .. code:: xml HAL hal@ccsd.cnrs.fr hal hal-01243573 The assignment problem https://hal.archives-ouvertes.fr/hal-01243573 other identifier, DOI, ARK Domain description author1 Inria UPMC author2 Inria UPMC - origin - https://github.com/user/repo + examples by target type ^^^^^^^^^^^^^^^^^^^^^^^ snapshot ********* .. code:: xml - snapshot - swh:1:snp:aaaaaaaaaaaaaa... + revision ******** .. code:: xml - revision - swh:1:rev:aaaaa............ + release ******* .. code:: xml - release - swh:1:rel:aaaaaaaaaaaaaa.... + Loading procedure ------------------ In this case, the meta-deposit will be injected as a metadata entry at the appropriate level (origin_metadata, revision_metadata, etc.) and won't result in the creation of a new object like with the complete deposit and the sparse-deposit. diff --git a/docs/specs/spec-sparse-deposit.rst b/docs/specs/spec-sparse-deposit.rst index 534957a8..6fd12f5c 100644 --- a/docs/specs/spec-sparse-deposit.rst +++ b/docs/specs/spec-sparse-deposit.rst @@ -1,109 +1,104 @@ The sparse-deposit ================== Goal ---- A client wishes to transfer a tarball for which part of the content is already in the SWH archive. Requirements ------------ To do so, the paths to the missing directories/content must be provided as empty paths in the tarball and the list linking each path to the object in the archive will be provided as part of the metadata. The list will be refered to as the manifest list. +----------------------+-------------------------------------+ | path | swh-id | +======================+=====================================+ -| ./path/to/file.txt | swh:1:cnt:aaaaaaaaaaaaaaaaaaaaa... | +| path/to/file.txt | swh:1:cnt:aaaaaaaaaaaaaaaaaaaaa... | +----------------------+-------------------------------------+ -| ./path/to/dir/ | swh:1:dir:aaaaaaaaaaaaaaaaaaaaa... | +| path/to/dir/ | swh:1:dir:aaaaaaaaaaaaaaaaaaaaa... | +----------------------+-------------------------------------+ Note: the *name* of the file or the directory is given by the path and is not part of the identified object. A concrete example ------------------ The manifest list is included in the metadata xml atomEntry under the swh namespace: .. code:: xml HAL hal@ccsd.cnrs.fr hal hal-01243573 The assignment problem https://hal.archives-ouvertes.fr/hal-01243573 other identifier, DOI, ARK Domain description author1 Inria UPMC author2 Inria UPMC - - - ./path/to/file.txt - swh:1:cnt:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa - - - ./path/to/second_file.txt - swh:1:cnt:bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb - - - ./path/to/dir/ - swh:1:dir:ddddddddddddddddddddddddddddddddd - - + + + + + + The tarball sent with the deposit will contain the following empty paths: - path/to/file.txt - path/to/second_file.txt - path/to/dir/ Deposit verification -------------------- After checking the integrity of the deposit content and metadata, the following checks should be added: 1. validate the manifest list structure with a swh-id for each path 2. verify that the paths in the manifest list are explicit and empty in the tarball 3. verify that the path name corresponds to the object type 4. locate the identifiers in the SWH archive Each one of the verifications should return a different error with the deposit and result in a 'rejected' deposit. Loading procedure ------------------ The injection procedure should include: - load the tarball data - create new objects using the path name and create links from the path to the SWH object using the identifier - calculate identifier of the new objects at each level - return final swh-id of the new revision Invariant: the same content should yield the same swhid, that's why a complete deposit with all the content and a sparse-deposit with the correct links will result with the same root directory swh-id and if the metadata are identical also with the same revision swh-id. diff --git a/docs/specs/swh.xsd b/docs/specs/swh.xsd index 37ac2cca..4dbf0ac6 100644 --- a/docs/specs/swh.xsd +++ b/docs/specs/swh.xsd @@ -1,23 +1,41 @@ - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +