Page Menu
Home
Software Heritage
Search
Configure Global Search
Log In
Files
F9339125
No One
Temporary
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Flag For Later
Size
22 KB
Subscribers
None
View Options
diff --git a/docs/getting-started.rst b/docs/getting-started.rst
index 399658c0..59d7f858 100644
--- a/docs/getting-started.rst
+++ b/docs/getting-started.rst
@@ -1,309 +1,309 @@
Getting Started
===============
This is a guide for how to prepare and push a software deposit with
the swh-deposit commands.
The api is rooted at https://deposit.softwareheritage.org/1.
For more details, see the `main documentation <./index.html>`__.
Requirements
------------
You need to be referenced on SWH's client list to have:
* credentials (needed for the basic authentication step)
- in this document we reference ``<name>`` as the client's name and
``<pass>`` as its associated authentication password.
* an associated collection
`Contact us for more
information. <https://www.softwareheritage.org/contact/>`__
Prepare a deposit
-----------------
* compress the files in a supported archive format:
- zip: common zip archive (no multi-disk zip files).
- tar: tar archive without compression or optionally any of the
following compression algorithm gzip (.tar.gz, .tgz), bzip2
(.tar.bz2) , or lzma (.tar.lzma)
* prepare a metadata file (`more details <./metadata.html>`__.):
- specify metadata schema/vocabulary (CodeMeta is recommended)
- specify *MUST* metadata (url, authors, software name and
the external\_identifier)
- add all available information under the compatible metadata term
An example of an atom entry file with CodeMeta terms:
.. code:: xml
<?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://www.w3.org/2005/Atom"
xmlns:codemeta="https://doi.org/10.5063/SCHEMA/CODEMETA-2.0">
<title>Je suis GPL</title>
<client>swh</client>
<external_identifier>je-suis-gpl</external_identifier>
<codemeta:url>https://forge.softwareheritage.org/source/jesuisgpl/</codemeta:url>
<codemeta:dateCreated>2018-01-05</codemeta:dateCreated>
<codemeta:description>Je suis GPL is a modified version of GNU Hello whose
sole purpose is to showcase the usage of
Software Heritage for license compliance purposes.</codemeta:description>
<codemeta:version>0.1</codemeta:version>
<codemeta:runtimePlatform>GNU/Linux</codemeta:runtimePlatform>
<codemeta:developmentStatus>stable</codemeta:developmentStatus>
<codemeta:programmingLanguage>C</codemeta:programmingLanguage>
<codemeta:license>
<codemeta:name>GNU General Public License v3.0 or later</codemeta:name>
<codemeta:url>https://spdx.org/licenses/GPL-3.0-or-later.html</codemeta:url>
</codemeta:license>
<codemeta:author>
<codemeta:name>Stefano Zacchiroli</codemeta:name>
<codemeta:jobTitle>Maintainer</codemeta:jobTitle>
</codemeta:author>
</entry>
Push deposit
------------
You can push a deposit with:
* a single deposit (archive + metadata):
The user posts in one query a software
source code archive and associated metadata.
The deposit is directly marked with status ``deposited``.
* a multisteps deposit:
1. Create an incomplete deposit (marked with status ``partial``)
2. Add data to a deposit (in multiple requests if needed)
3. Finalize deposit (the status becomes ``deposited``)
Single deposit
^^^^^^^^^^^^^^
Once the files are ready for deposit, we want to do the actual deposit
in one shot, sending exactly one POST query:
* 1 archive (content-type ``application/zip`` or ``application/x-tar``)
* 1 metadata file in atom xml format (``content-type: application/atom+xml;type=entry``)
For this, we need to provide the:
* arguments: ``--username 'name' --password 'pass'`` as credentials
* archive's path (example: ``--archive path/to/archive-name.tgz``) :
* (optionally) metadata file's path ``--metadata
path/to/file.metadata.xml``. If not provided, the archive's filename
will be used to determine the metadata file, e.g:
``path/to/archive-name.tgz.metadata.xml``
* (optionally) ``--slug 'your-id'`` argument, a reference to a
unique identifier the client uses for the software object.
You can do this with the following command:
minimal deposit
.. code:: shell
$ swh-deposit ---username name --password secret \
--archive je-suis-gpl.tgz
with client's external identifier (``slug``)
.. code:: shell
$ swh-deposit --username name --password secret \
--archive je-suis-gpl.tgz \
--slug je-suis-gpl
to a specific client's collection
.. code:: shell
$ swh-deposit --username name --password secret \
--archive je-suis-gpl.tgz \
--collection 'second-collection'
You just posted a deposit to your collection on Software Heritage
If everything went well, the successful response will contain the
elements below:
.. code:: shell
{
'deposit_status': 'deposited',
'deposit_id': '7',
'deposit_date': 'Jan. 29, 2018, 12:29 p.m.'
}
Note: As the deposit is in ``deposited`` status, you can no longer
update the deposit after this query. It will be answered with a 403
forbidden answer.
If something went wrong, an equivalent response will be given with the
`error` and `detail` keys explaining the issue, e.g.:
.. code:: shell
{
'error': 'Unknown collection name xyz',
'detail': None,
'deposit_status': None,
'deposit_status_detail': None,
'deposit_swh_id': None,
'status': 404
}
multisteps deposit
^^^^^^^^^^^^^^^^^^^^^^^^^
The steps to create a multisteps deposit:
1. Create an incomplete deposit
-~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
First use the ``--partial`` argument to declare there is more to come
.. code:: shell
$ swh-deposit --username name --password secret \
--archive foo.tar.gz \
--partial
2. Add content or metadata to the deposit
-~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Continue the deposit by using the ``--deposit-id`` argument given as a response
for the first step. You can continue adding content or metadata while you use
the ``--partial`` argument.
.. code:: shell
$ swh-deposit --username name --password secret \
--archive add-foo.tar.gz \
--deposit-id 42 \
--partial
In case you want to add only one new archive without metadata:
.. code:: shell
$ swh-deposit --username name --password secret \
--archive add-foo.tar.gz \
--archive-deposit \
--deposit-id 42 \
--partial \
If you want to add only metadata, use:
.. code:: shell
$ swh-deposit --username name --password secret \
--metadata add-foo.tar.gz.metadata.xml \
--metadata-deposit \
--deposit-id 42 \
--partial
3. Finalize deposit
~~~~~~~~~~~~~~~~~~~
On your last addition, by not declaring it as ``--partial``, the
deposit will be considered as completed and its status will be changed
to ``deposited``.
Update deposit
----------------
* replace deposit:
- only possible if the deposit status is ``partial`` and
``--deposit-id <id>`` is provided
- by using the ``--replace`` flag
- ``--metadata-deposit`` replaces associated existing metadata
- ``--archive-deposit`` replaces associated archive(s)
- by default, with no flag or both, you'll replace associated
metadata and archive(s)
.. code:: shell
$ swh-deposit --username name --password secret \
--deposit-id 11 \
--archive updated-je-suis-gpl.tgz \
--replace
* update a loaded deposit with a new version:
- by using the external-id with the ``--slug`` argument, you will
link the new deposit with its parent deposit
.. code:: shell
$ swh-deposit --username name --password secret \
--archive je-suis-gpl-v2.tgz \
--slug 'je-suis-gpl' \
Check the deposit's status
--------------------------
You can check the status of the deposit by using the ``--deposit-id`` argument:
.. code:: shell
-$ swh-deposit --username name --password secret --deposit-id '11' --status
+ $ swh-deposit --username name --password secret --deposit-id '11' --status
.. code:: json
{
'deposit_id': '11',
'deposit_status': 'deposited',
'deposit_swh_id': None,
'deposit_status_detail': 'Deposit is ready for additional checks \
(tarball ok, metadata, etc...)'
}
The different statuses:
- **partial**: multipart deposit is still ongoing
- **deposited**: deposit completed
- **rejected**: deposit failed the checks
- **verified**: content and metadata verified
- **loading**: loading in-progress
- **done**: loading completed successfully
- **failed**: the deposit loading has failed
When the deposit has been loaded into the archive, the status will be
marked ``done``. In the response, will also be available the
<deposit_swh_id>, <deposit_swh_id_context>, <deposit_swh_anchor_id>,
<deposit_swh_anchor_id_context>. For example:
.. code:: json
{
'deposit_id': '11',
'deposit_status': 'done',
'deposit_swh_id': 'swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9',
'deposit_swh_id_context': 'swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9;origin=https://forge.softwareheritage.org/source/jesuisgpl/',
'deposit_swh_anchor_id': 'swh:1:rev:e76ea49c9ffbb7f73611087ba6e999b19e5d71eb',
'deposit_swh_anchor_id_context': 'swh:1:rev:e76ea49c9ffbb7f73611087ba6e999b19e5d71eb;origin=https://forge.softwareheritage.org/source/jesuisgpl/',
'deposit_status_detail': 'The deposit has been successfully \
loaded into the Software Heritage archive'
}
diff --git a/docs/index.rst b/docs/index.rst
index 23e304b5..e8ffe3ef 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -1,21 +1,22 @@
.. _swh-deposit:
Software Heritage Deposit
=========================
.. toctree::
:maxdepth: 1
:caption: Contents:
getting-started.rst
spec-api.rst
metadata.rst
dev-info.rst
sys-info.rst
+ specs/specs.rst
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
diff --git a/docs/blueprint.rst b/docs/specs/blueprint.rst
similarity index 84%
rename from docs/blueprint.rst
rename to docs/specs/blueprint.rst
index 1fa91cd9..e0b93e8f 100644
--- a/docs/blueprint.rst
+++ b/docs/specs/blueprint.rst
@@ -1,114 +1,114 @@
Use cases
---------
Deposit creation
~~~~~~~~~~~~~~~~
From client's deposit repository server to SWH's repository server:
1. The client requests for the server's abilities and its associated collection
- (GET query to the *SD/service document uri*)
+ (GET query to the *SD/service document uri*)
2. The server answers the client with the service document which gives the
- *collection uri* (also known as *COL/collection IRI*).
+ *collection uri* (also known as *COL/collection IRI*).
3. The client sends a deposit (optionally a zip archive, some metadata or both)
- through the *collection uri*.
+ through the *collection uri*.
This can be done in:
* one POST request (metadata + archive).
* one POST request (metadata or archive) + other PUT or POST request to the
*update uris* (*edit-media iri* or *edit iri*)
- 1. Server validates the client's input or returns detailed error if any
+ a. Server validates the client's input or returns detailed error if any
- 2. Server stores information received (metadata or software archive source
+ b. Server stores information received (metadata or software archive source
code or both)
4. The server notifies the client it acknowledged the client's request. An
- ``http 201 Created`` response with a deposit receipt in the body response is
- sent back. That deposit receipt will hold the necessary information to
- eventually complete the deposit later on if it was incomplete (also known as
- status ``partial``).
+ ``http 201 Created`` response with a deposit receipt in the body response is
+ sent back. That deposit receipt will hold the necessary information to
+ eventually complete the deposit later on if it was incomplete (also known as
+ status ``partial``).
Schema representation
^^^^^^^^^^^^^^^^^^^^^
.. raw:: html
<!-- {F2884278} -->
.. figure:: /images/deposit-create-chart.png
:alt:
Updating an existing deposit
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5. Client updates existing deposit through the *update uris* (one or more POST
or PUT requests to either the *edit-media iri* or *edit iri*).
1. Server validates the client's input or returns detailed error if any
2. Server stores information received (metadata or software archive source
code or both)
This would be the case for example if the client initially posted a
``partial`` deposit (e.g. only metadata with no archive, or an archive
without metadata, or a splitted archive because the initial one exceeded
the limit size imposed by swh repository deposit)
Schema representation
^^^^^^^^^^^^^^^^^^^^^
.. raw:: html
<!-- {F2884302} -->
.. figure:: /images/deposit-update-chart.png
:alt:
Deleting deposit (or associated archive, or associated metadata)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
6. Deposit deletion is possible as long as the deposit is still in ``partial``
state.
1. Server validates the client's input or returns detailed error if any
2. Server actually delete information according to request
Schema representation
^^^^^^^^^^^^^^^^^^^^^
.. raw:: html
<!-- {F2884311} -->
.. figure:: /images/deposit-delete-chart.png
:alt:
Client asks for operation status
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
7. Operation status can be read through a GET query to the *state iri*.
Server: Triggering deposit checks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Once the status ``deposited`` is reached for a deposit, checks for the
associated archive(s) and metadata will be triggered. If those checks
fail, the status is changed to ``rejected`` and nothing more happens
there. Otherwise, the status is changed to ``verified``.
Server: Triggering deposit load
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Once the status ``verified`` is reached for a deposit, loading the
deposit with its associated metadata will be triggered.
The loading will result on status update, either ``done`` or ``failed``
(depending on the loading's status).
This is described in the `loading document <./spec-loading.html>`__.
diff --git a/docs/specs/metadata_example.xml b/docs/specs/metadata_example.xml
new file mode 100644
index 00000000..59c5ed82
--- /dev/null
+++ b/docs/specs/metadata_example.xml
@@ -0,0 +1,38 @@
+<?xml version="1.0"?>
+ <entry xmlns="http://www.w3.org/2005/Atom"
+ xmlns:codemeta="https://doi.org/10.5063/SCHEMA/CODEMETA-2.0"
+ xmlns:swh="swh.xsd">
+ "{http://www.w3.org/2005/Atom}author": {
+ "{http://www.w3.org/2005/Atom}email": "hal@ccsd.cnrs.fr",
+ "{http://www.w3.org/2005/Atom}name": "HAL"
+ },
+ <author>
+ <name>HAL</name>
+ <email>hal@ccsd.cnrs.fr</email>
+ </author>
+ <client>hal</client>
+ <external_identifier>hal-01243573</external_identifier>
+ <codemeta:name>The assignment problem</codemeta:name>
+ <codemeta:url>https://hal.archives-ouvertes.fr/hal-01243573</codemeta:url>
+ <codemeta:identifier>other identifier, DOI, ARK</codemeta:identifier>
+ <codemeta:applicationCategory>Domain</codemeta:applicationCategory>
+ <codemeta:description>description</codemeta:description>
+ <codemeta:author>
+ <codemeta:name> author1 </codemeta:name>
+ <codemeta:affiliation> Inria </codemeta:affiliation>
+ <codemeta:affiliation> UPMC </codemeta:affiliation>
+ </codemeta:author>
+ <codemeta:author>
+ <codemeta:name> author2 </codemeta:name>
+ <codemeta:affiliation> Inria </codemeta:affiliation>
+ <codemeta:affiliation> UPMC </codemeta:affiliation>
+ </codemeta:author>
+ <swh:deposit>
+ <swh:manifest>
+ <swh:object>
+ <swh:path>./path/to/file.txt</swh:path>
+ <swh:swhid>aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa</swh:swhid>
+ </swh:object>
+ </swh:manifest>
+ </swh:deposit>
+ </entry>
diff --git a/docs/spec-loading.rst b/docs/specs/spec-loading.rst
similarity index 100%
rename from docs/spec-loading.rst
rename to docs/specs/spec-loading.rst
diff --git a/docs/specs/spec-meta-deposit.rst b/docs/specs/spec-meta-deposit.rst
new file mode 100644
index 00000000..2d682449
--- /dev/null
+++ b/docs/specs/spec-meta-deposit.rst
@@ -0,0 +1,31 @@
+The meta-deposit
+================
+
+Goal
+----
+A client wishes to deposit only metadata about an object in the Software
+Heritage archive.
+
+The meta-deposit is a special deposit where no content is
+deposited and the data transfered to Software Heritage is only
+the metadata about an object or several objects in the archive.
+
+The scope of the meta-deposit is larger than the sparse-deposit, because
+with a meta-deposit all types of objects in the archive can be described
+with the deposited metadata:
+
+- origin
+- snapshot
+- revision
+- release
+- directory
+- content
+
+
+Loading procedure
+------------------
+
+In this case, the meta-deposit will be injected as a metadata entry at the
+appropriate level (origin_metadata, revision_metadata, etc.) and won't result
+in the creation of a new object like with the complete deposit and the
+sparse-deposit.
diff --git a/docs/specs/spec-sparse-deposit.rst b/docs/specs/spec-sparse-deposit.rst
new file mode 100644
index 00000000..534957a8
--- /dev/null
+++ b/docs/specs/spec-sparse-deposit.rst
@@ -0,0 +1,109 @@
+The sparse-deposit
+==================
+
+Goal
+----
+A client wishes to transfer a tarball for which part of the content is
+already in the SWH archive.
+
+Requirements
+------------
+To do so, the paths to the missing directories/content must be provided as
+empty paths in the tarball and the list linking each path to the object in the
+archive will be provided as part of the metadata. The list will be refered to
+as the manifest list.
+
++----------------------+-------------------------------------+
+| path | swh-id |
++======================+=====================================+
+| ./path/to/file.txt | swh:1:cnt:aaaaaaaaaaaaaaaaaaaaa... |
++----------------------+-------------------------------------+
+| ./path/to/dir/ | swh:1:dir:aaaaaaaaaaaaaaaaaaaaa... |
++----------------------+-------------------------------------+
+
+Note: the *name* of the file or the directory is given by the path and is not
+part of the identified object.
+
+A concrete example
+------------------
+The manifest list is included in the metadata xml atomEntry under the
+swh namespace:
+
+.. code:: xml
+
+ <?xml version="1.0"?>
+ <entry xmlns="http://www.w3.org/2005/Atom"
+ xmlns:codemeta="https://doi.org/10.5063/SCHEMA/CODEMETA-2.0"
+ xmlns:swh="swh.xsd">
+ <author>
+ <name>HAL</name>
+ <email>hal@ccsd.cnrs.fr</email>
+ </author>
+ <client>hal</client>
+ <external_identifier>hal-01243573</external_identifier>
+ <codemeta:name>The assignment problem</codemeta:name>
+ <codemeta:url>https://hal.archives-ouvertes.fr/hal-01243573</codemeta:url>
+ <codemeta:identifier>other identifier, DOI, ARK</codemeta:identifier>
+ <codemeta:applicationCategory>Domain</codemeta:applicationCategory>
+ <codemeta:description>description</codemeta:description>
+ <codemeta:author>
+ <codemeta:name> author1 </codemeta:name>
+ <codemeta:affiliation> Inria </codemeta:affiliation>
+ <codemeta:affiliation> UPMC </codemeta:affiliation>
+ </codemeta:author>
+ <codemeta:author>
+ <codemeta:name> author2 </codemeta:name>
+ <codemeta:affiliation> Inria </codemeta:affiliation>
+ <codemeta:affiliation> UPMC </codemeta:affiliation>
+ </codemeta:author>
+ <swh:deposit>
+ <swh:manifest>
+ <swh:object>
+ <swh:path>./path/to/file.txt</swh:path>
+ <swh:swhid>swh:1:cnt:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa</swh:swhid>
+ </swh:object>
+ <swh:object>
+ <swh:path>./path/to/second_file.txt</swh:path>
+ <swh:swhid>swh:1:cnt:bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb</swh:swhid>
+ </swh:object>
+ <swh:object>
+ <swh:path>./path/to/dir/</swh:path>
+ <swh:swhid>swh:1:dir:ddddddddddddddddddddddddddddddddd</swh:swhid>
+ </swh:object>
+ </swh:manifest>
+ </swh:deposit>
+ </entry>
+
+The tarball sent with the deposit will contain the following empty paths:
+- path/to/file.txt
+- path/to/second_file.txt
+- path/to/dir/
+
+Deposit verification
+--------------------
+
+After checking the integrity of the deposit content and
+metadata, the following checks should be added:
+
+1. validate the manifest list structure with a swh-id for each path
+2. verify that the paths in the manifest list are explicit and empty in the tarball
+3. verify that the path name corresponds to the object type
+4. locate the identifiers in the SWH archive
+
+Each one of the verifications should return a different error with the deposit
+and result in a 'rejected' deposit.
+
+Loading procedure
+------------------
+The injection procedure should include:
+
+- load the tarball data
+- create new objects using the path name and create links from the path to the
+ SWH object using the identifier
+- calculate identifier of the new objects at each level
+- return final swh-id of the new revision
+
+Invariant: the same content should yield the same swhid, that's why a complete
+deposit with all the content and a sparse-deposit with the correct links will
+result with the same root directory swh-id and if the metadata are identical
+also with the same revision swh-id.
diff --git a/docs/specs/specs.rst b/docs/specs/specs.rst
new file mode 100644
index 00000000..608183c4
--- /dev/null
+++ b/docs/specs/specs.rst
@@ -0,0 +1,13 @@
+.. _swh-deposit-specs:
+
+Software Heritage Deposit Specifications
+========================================
+
+.. toctree::
+ :maxdepth: 1
+ :caption: Contents:
+
+ blueprint.rst
+ spec-loading.rst
+ spec-sparse-deposit.rst
+ spec-meta-deposit.rst
File Metadata
Details
Attached
Mime Type
text/x-diff
Expires
Jul 4 2025, 9:26 AM (5 w, 5 d ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3248384
Attached To
rDDEP Push deposit
Event Timeline
Log In to Comment