diff --git a/docs/specs/metadata_example.xml b/docs/specs/metadata_example.xml index 7ae6c8e4..e30cf4b3 100644 --- a/docs/specs/metadata_example.xml +++ b/docs/specs/metadata_example.xml @@ -1,31 +1,31 @@ + xmlns:swh="https://www.softwareheritage.org/schema/2018/deposit"> HAL hal@ccsd.cnrs.fr hal hal-01243573 The assignment problem https://hal.archives-ouvertes.fr/hal-01243573 other identifier, DOI, ARK Domain description author1 Inria UPMC author2 Inria UPMC diff --git a/docs/specs/spec-meta-deposit.rst b/docs/specs/spec-meta-deposit.rst index 887baef2..f7f87707 100644 --- a/docs/specs/spec-meta-deposit.rst +++ b/docs/specs/spec-meta-deposit.rst @@ -1,98 +1,99 @@ The metadata-deposit ==================== Goal ---- A client wishes to deposit only metadata about an object in the Software Heritage archive. The metadata-deposit is a special deposit where no content is provided and the data transfered to Software Heritage is only the metadata about an object or several objects in the archive. Requirements ------------ The scope of the meta-deposit is different than the sparse-deposit. While a sparse-deposit creates a revision with referenced directories and content files, the metadata-deposit references one of the following: - origin - snapshot - revision - release A complete metadata example --------------------------- The reference element is included in the metadata xml atomEntry under the -swh namespace (a link for the published schema will be provided during -the implementation of the metadata deposit): +swh namespace: + +TODO: publish schema at https://www.softwareheritage.org/schema/2018/deposit .. code:: xml + xmlns:swh="https://www.softwareheritage.org/schema/2018/deposit"> HAL hal@ccsd.cnrs.fr hal hal-01243573 The assignment problem https://hal.archives-ouvertes.fr/hal-01243573 other identifier, DOI, ARK Domain description author1 Inria UPMC author2 Inria UPMC Examples by target type ^^^^^^^^^^^^^^^^^^^^^^^ Reference an origin: .. code:: xml - + Reference a snapshot, revision or release: .. code:: xml With ${type} in {snp (snapshot), rev (revision), rel (release) }: Loading procedure ------------------ In this case, the metadata-deposit will be injected as a metadata entry at the appropriate level (origin_metadata, revision_metadata, etc.) with the information about the contributor of the deposit. Contrary to the complete and sparse deposit, there will be no object creation. diff --git a/docs/specs/spec-sparse-deposit.rst b/docs/specs/spec-sparse-deposit.rst index ffa001ae..461694fa 100644 --- a/docs/specs/spec-sparse-deposit.rst +++ b/docs/specs/spec-sparse-deposit.rst @@ -1,102 +1,102 @@ The sparse-deposit ================== Goal ---- A client wishes to transfer a tarball for which part of the content is already in the SWH archive. Requirements ------------ To do so, a list of paths with targets must be provided in the metadata and the paths to the missing directories/content should not be included in the tarball. The list will be referred to as the manifest list using the entry name 'bindings' in the metadata. +----------------------+-------------------------------------+ | path | swh-id | +======================+=====================================+ | path/to/file.txt | swh:1:cnt:aaaaaaaaaaaaaaaaaaaaa... | +----------------------+-------------------------------------+ | path/to/dir/ | swh:1:dir:aaaaaaaaaaaaaaaaaaaaa... | +----------------------+-------------------------------------+ Note: the *name* of the file or the directory is given by the path and is not part of the identified object. +TODO: see if a trailing "/" is mandatory for implementation. + A concrete example ------------------ The manifest list is included in the metadata xml atomEntry under the -swh namespace (a link for the published schema will be provided during -the implementation of the sparse deposit): +swh namespace: + +TODO: publish schema at https://www.softwareheritage.org/schema/2018/deposit .. code:: xml + xmlns:swh="https://www.softwareheritage.org/schema/2018/deposit"> - HAL + HALit mandatory to have a trailing "/", hal@ccsd.cnrs.fr hal hal-01243573 The assignment problem https://hal.archives-ouvertes.fr/hal-01243573 other identifier, DOI, ARK Domain description author1 Inria UPMC author2 Inria UPMC - - - + + + Deposit verification -------------------- After checking the integrity of the deposit content and metadata, the following checks should be added: 1. validate the manifest list structure with a correct swh-id for each path (syntax check on the swh-id format) 2. verify that the path name corresponds to the object type 3. locate the identifiers in the SWH archive Each failing check should return a different error with the deposit and result in a 'rejected' deposit. Loading procedure ------------------ The injection procedure should include: - load the tarball new data - create new objects using the path name and create links from the path to the SWH object using the identifier - calculate identifier of the new objects at each level - return final swh-id of the new revision Invariant: the same content should yield the same swh-id, that's why a complete deposit with all the content and a sparse-deposit with the correct links will result with the same root directory swh-id. The same is expected with the revision swh-id if the metadata provided is identical.