diff --git a/docs/api/index.rst b/docs/api/index.rst
index 2a715043..7d8220b3 100644
--- a/docs/api/index.rst
+++ b/docs/api/index.rst
@@ -1,13 +1,14 @@
.. _swh-deposit-api:
Deposit API
===========
.. toctree::
:maxdepth: 2
:caption: Contents:
user-manual
api-documentation
metadata
use-cases
+ register-account
diff --git a/docs/api/register-account.rst b/docs/api/register-account.rst
new file mode 100644
index 00000000..5d7f2258
--- /dev/null
+++ b/docs/api/register-account.rst
@@ -0,0 +1,45 @@
+.. _swh-deposit-register-account:
+
+.. admonition:: Intended audience
+ :class: important
+
+ - deposit clients
+ - sysadm staff members
+
+Register account
+================
+
+.. _swh-deposit-register-account-as-deposit-client:
+
+As a deposit client
+-------------------
+
+For this, as a client, you need to register an account on the swh keycloak `production
+`_
+or `staging
+`_
+instance.
+
+.. _swh-deposit-register-account-as-sysadm:
+
+As a sysadm
+-----------
+
+
+1. Retrieve the deposit client login (through email exchange or any other media).
+
+2. Require a :ref:`provider url ` from the deposit
+ client (through email exchange or any other media).
+
+3. Within the keycloak `production instance `_ or `staging
+ instance `_, add the `swh.deposit.api` role to the deposit
+ client login.
+
+4. Create an :ref:`associated deposit collection
+ ` in the deposit instance.
+
+5. Create :ref:`a deposit client ` with the
+ provider url in the deposit instance.
+
+6. To ensure everything is ok, ask the deposit client to check they can access at least
+ the service document iri (authenticated).
diff --git a/docs/api/user-manual.rst b/docs/api/user-manual.rst
index 88e968cb..991a2f1c 100644
--- a/docs/api/user-manual.rst
+++ b/docs/api/user-manual.rst
@@ -1,487 +1,487 @@
.. _deposit-user-manual:
User Manual
===========
This is a guide for how to prepare and push a software deposit with
the ``swh deposit`` commands.
Requirements
------------
-You need to have an account on the Software Heritage deposit application to be
-able to use the service.
+You need to :ref:`have an account on the Software Heritage deposit application
+` to be able to use the service.
Please `contact the Software Heritage team `_ for
more information on how to get access to this service.
For testing purpose, a test instance `is available
`_ [#f1]_ and will be used in the examples below.
Once you have an account, you should get a set of access credentials as a
``login`` and a ``password`` (identified as ```` and ```` in the
remaining of this document). A deposit account also comes with a "provider URL"
which is used by SWH to build the :term:`Origin URL` of deposits
created using this account.
Installation
------------
To install the ``swh.deposit`` command line tools, you need a working Python 3.7+
environment. It is strongly recommended you use a `virtualenv
`_ for this.
.. code:: console
$ python3 -m virtualenv deposit
[...]
$ source deposit/bin/activate
(deposit)$ pip install swh.deposit
[...]
(deposit)$ swh deposit --help
Usage: swh deposit [OPTIONS] COMMAND [ARGS]...
Deposit main command
Options:
-h, --help Show this message and exit.
Commands:
admin Server administration tasks (manipulate user or...
status Deposit's status
upload Software Heritage Public Deposit Client Create/Update...
(deposit)$
Note: in the examples below, we use the `jq`_ tool to make json outputs nicer.
If you do have it already, you may install it using your distribution's
packaging system. For example, on a Debian system:
.. _jq: https://stedolan.github.io/jq/
.. code:: console
$ sudo apt install jq
.. _prepare-deposit:
Prepare a deposit
-----------------
* compress the files in a supported archive format:
- zip: common zip archive (no multi-disk zip files).
- tar: tar archive without compression or optionally any of the
following compression algorithm gzip (``.tar.gz``, ``.tgz``), bzip2
(``.tar.bz2``) , or lzma (``.tar.lzma``)
* (Optional) prepare a metadata file (more details :ref:`deposit-metadata`):
Example:
Assuming you want to deposit the source code of `belenios
`_ version 1.12
.. code:: console
(deposit)$ wget https://gitlab.inria.fr/belenios/belenios/-/archive/1.12/belenios-1.12.zip
[...]
2020-10-28 11:40:37 (4,56 MB/s) - ‘belenios-1.12.zip’ saved [449880/449880]
(deposit)$
Then you need to prepare a metadata file allowing you to give detailed
information on your deposited source code. A rather minimal Atom with Codemeta
file could be:
.. code:: console
(deposit)$ cat metadata.xml
Verifiable online voting system
belenios-01243065
https://gitlab.inria.fr/belenios/belenios
test
Online voting
Verifiable online voting system
1.12
opam
stable
ocaml
GNU Affero General Public License
Belenios
belenios@example.com
Belenios Test User
(deposit)$
Please read the :ref:`deposit-metadata` page for a more detailed view on the
metadata file formats and semantics; and :ref:`deposit-create_origin` for
a description of the ```` tag.
Push a deposit
--------------
You can push a deposit with:
* a single deposit (archive + metadata):
The user posts in one query a software
source code archive and associated metadata.
The deposit is directly marked with status ``deposited``.
* a multisteps deposit:
1. Create an incomplete deposit (marked with status ``partial``)
2. Add data to a deposit (in multiple requests if needed)
3. Finalize deposit (the status becomes ``deposited``)
* a metadata-only deposit:
The user posts in one query an associated metadata file on a :ref:`SWHID
` object. The deposit is directly marked with status
``done``.
Overall, a deposit can be a in series of steps as follow:
.. figure:: ../images/status.svg
:alt:
The important things to notice for now is that it can be:
partial:
the deposit is partially received
expired:
deposit has been there too long and is now deemed
ready to be garbage collected
deposited:
deposit is complete and is ready to be checked to ensure data consistency
verified:
deposit is fully received, checked, and ready for loading
loading:
loading is ongoing on swh's side
done:
loading is successful
failed:
loading is a failure
When you push a deposit, it is either in the ``deposited`` state or in the
``partial`` state if you asked for a partial upload.
Single deposit
^^^^^^^^^^^^^^
Once the files are ready for deposit, we want to do the actual deposit in one
shot, i.e. sending both the archive (zip) file and the metadata file.
* 1 archive (content-type ``application/zip`` or ``application/x-tar``)
* 1 metadata file in atom xml format (``content-type: application/atom+xml;type=entry``)
For this, we need to provide the:
* arguments: ``--username 'name' --password 'pass'`` as credentials
* archive's path (example: ``--archive path/to/archive-name.tgz``)
* metadata file path (example: ``--metadata path/to/metadata.xml``)
to the ``swh deposit upload`` command.
Example:
To push the Belenios 1.12 we prepared previously on the testing instance of the
deposit:
.. code:: console
(deposit)$ ls
belenios-1.12.zip metadata.xml deposit
(deposit)$ swh deposit upload --username --password \
--url https://deposit.staging.swh.network/1 \
--create-origin http://has.archives-ouvertes.fr/test-01243065 \
--archive belenios.zip \
--metadata metadata.xml \
--format json | jq
{
'deposit_status': 'deposited',
'deposit_id': '1',
'deposit_date': 'Oct. 28, 2020, 1:52 p.m.',
'deposit_status_detail': None
}
(deposit)$
You just posted a deposit to your main collection on Software Heritage (staging
area)!
The returned value is a JSON dict, in which you will notably find the deposit
id (needed to check for its status later on) and the current status, which
should be ``deposited`` if no error has occurred.
Note: As the deposit is in ``deposited`` status, you can no longer
update the deposit after this query. It will be answered with a 403
(Forbidden) answer.
If something went wrong, an equivalent response will be given with the
``error`` and ``detail`` keys explaining the issue, e.g.:
.. code:: console
{
'error': 'Unknown collection name xyz',
'detail': None,
'deposit_status': None,
'deposit_status_detail': None,
'deposit_swh_id': None,
'status': 404
}
Once the deposit has been done, you can check its status using the ``swh deposit
status`` command:
.. code:: console
(deposit)$ swh deposit status --username --password \
--url https://deposit.staging.swh.network/1 \
--deposit-id 1 -f json | jq
{
"deposit_id": "1",
"deposit_status": "done",
"deposit_status_detail": "The deposit has been successfully loaded into the Software Heritage archive",
"deposit_swh_id": "swh:1:dir:63a6fc0ed8f69bf66ccbf99fc0472e30ef0a895a",
"deposit_swh_id_context": "swh:1:dir:63a6fc0ed8f69bf66ccbf99fc0472e30ef0a895a;origin=https://softwareheritage.org/belenios-01234065;visit=swh:1:snp:0ae536667689da7047bfb7aa9f37f5958e9f4647;anchor=swh:1:rev:17ad98c940104d45b6b6bd6fba9aa832eeb95638;path=/",
"deposit_external_id": "belenios-01234065"
}
Metadata-only deposit
^^^^^^^^^^^^^^^^^^^^^
This allows to deposit only metadata information on a :ref:`SWHID reference
`. Prepare a metadata file as described in the
:ref:`prepare deposit section `
Ensure this metadata file also declares a :ref:`SWHID reference
`:
.. code:: xml
For this, we then need to provide the following information:
* arguments: ``--username 'name' --password 'pass'`` as credentials
* metadata file path (example: ``--metadata path/to/metadata.xml``)
to the ``swh deposit metadata-only`` command.
Example:
.. code:: console
(deposit) swh deposit metadata-only --username --password \
--url https://deposit.staging.swh.network/1 \
--metadata ../deposit-swh.metadata-only.xml \
--format json | jq .
{
"deposit_id": "29",
"deposit_status": "done",
"deposit_date": "Dec. 15, 2020, 11:37 a.m."
}
For details on the metadata-only deposit, see the
:ref:`metadata-only deposit protocol reference `
Multisteps deposit
^^^^^^^^^^^^^^^^^^
In this case, the deposit is created by several requests, uploading objects
piece by piece. The steps to create a multisteps deposit:
1. Create an partial deposit
""""""""""""""""""""""""""""
First use the ``--partial`` argument to declare there is more to come
.. code:: console
$ swh deposit upload --username name --password secret \
--archive foo.tar.gz \
--partial
2. Add content or metadata to the deposit
"""""""""""""""""""""""""""""""""""""""""
Continue the deposit by using the ``--deposit-id`` argument given as a response
for the first step. You can continue adding content or metadata while you use
the ``--partial`` argument.
To only add one new archive to the deposit:
.. code:: console
$ swh deposit upload --username name --password secret \
--archive add-foo.tar.gz \
--deposit-id 42 \
--partial
To only add metadata to the deposit:
.. code:: console
$ swh deposit upload --username name --password secret \
--metadata add-foo.tar.gz.metadata.xml \
--deposit-id 42 \
--partial
3. Finalize deposit
"""""""""""""""""""
On your last addition (same command as before), by not declaring it
``--partial``, the deposit will be considered completed. Its status will be
changed to ``deposited``:
.. code:: console
$ swh deposit upload --username name --password secret \
--metadata add-foo.tar.gz.metadata.xml \
--deposit-id 42
Update deposit
--------------
* Update deposit metadata:
- only possible if the deposit status is ``done``, ``--deposit-id `` and
``--swhid `` are provided
- by using the ``--metadata`` flag, a path to an xml file
.. code:: console
$ swh deposit upload \
--username name --password secret \
--deposit-id 11 \
--swhid swh:1:dir:2ddb1f0122c57c8479c28ba2fc973d18508e6420 \
--metadata ../deposit-swh.update-metadata.xml
* Replace deposit:
- only possible if the deposit status is ``partial`` and
``--deposit-id `` is provided
- by using the ``--replace`` flag
- ``--metadata-deposit`` replaces associated existing metadata
- ``--archive-deposit`` replaces associated archive(s)
- by default, with no flag or both, you'll replace associated
metadata and archive(s):
.. code:: console
$ swh deposit upload --username name --password secret \
--deposit-id 11 \
--archive updated-je-suis-gpl.tgz \
--replace
* Update a loaded deposit with a new version (this creates a new deposit):
- by using ``--add-to-origin`` with an origin URL previously created with
``--create-origin``, you will link the new deposit with its parent deposit:
.. code:: console
$ swh deposit upload --username name --password secret \
--archive je-suis-gpl-v2.tgz \
--add-to-origin 'http://example.org/je-suis-gpl'
Check the deposit's status
--------------------------
You can check the status of the deposit by using the ``--deposit-id`` argument:
.. code:: console
$ swh deposit status --username name --password secret \
--deposit-id 11
.. code:: json
{
"deposit_id": 11,
"deposit_status": "deposited",
"deposit_swh_id": null,
"deposit_status_detail": "Deposit is ready for additional checks \
(tarball ok, metadata, etc...)"
}
When the deposit has been loaded into the archive, the status will be
marked ``done``. In the response, will also be available the
, . For example:
.. code:: json
{
"deposit_id": 11,
"deposit_status": "done",
"deposit_swh_id": "swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9",
"deposit_swh_id_context": "swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9;\
origin=https://forge.softwareheritage.org/source/jesuisgpl/;\
visit=swh:1:snp:68c0d26104d47e278dd6be07ed61fafb561d0d20;\
anchor=swh:1:rev:e76ea49c9ffbb7f73611087ba6e999b19e5d71eb;path=/",
"deposit_status_detail": "The deposit has been successfully \
loaded into the Software Heritage archive"
}
.. rubric:: Footnotes
.. [#f1] the test instance of the deposit is not yet available to external users,
but it should be available soon.
diff --git a/docs/internals/prod-environment.rst b/docs/internals/prod-environment.rst
index 8f4010d7..e7953370 100644
--- a/docs/internals/prod-environment.rst
+++ b/docs/internals/prod-environment.rst
@@ -1,115 +1,117 @@
.. _swh-deposit-prod-env:
Production deployment
=====================
The deposit is architectured around 3 parts:
- server: a django application exposing an xml api, discussing with a postgresql
backend (and optionally a keycloak instance)
- worker(s): 1 worker service dedicated to check the deposit archive and metadata are
correct (the checker), another worker service dedicated to actually ingest the
deposit into the swh archive.
- client: a python script ``swh deposit`` command line interface.
All those are packaged in 3 separated debian packages, created and uploaded to the swh
debian repository. The deposit server and workers configuration are managed by puppet
(cf. puppet-environment/swh-site, puppet-environment/swh-role,
puppet-environment/swh-profile)
In the following document, we will focus on the server actions that may be needed once
the server is installed or upgraded.
Prepare the database setup (existence, connection, etc...).
-----------------------------------------------------------
This is defined through the packaged module ``swh.deposit.settings.production`` and the
expected **/etc/softwareheritage/deposit/server.yml** configuration file.
Environment (production/staging)
--------------------------------
``SWH_CONFIG_FILENAME`` must be defined and target the deposit server configuration file.
So either 1. prefix the following commands or 2. export the environment variable in your
shell session. For the remaining part of the documentation, we assume 2. has been
configured.
.. code:: shell
export SWH_CONFIG_FILENAME=/etc/softwareheritage/deposit/server.yml
Migrate the db schema
---------------------
The debian package may integrate some new schema modifications. To run them:
.. code:: shell
sudo django-admin migrate --settings=swh.deposit.settings.production
+.. _swh-deposit-add-client-and-collection:
+
Add client and collection
-------------------------
The deposit can be configured to use either the 1. django basic authentication framework
or the 2. swh keycloak instance. If the server uses 2., the password is managed by
keycloak so the option ``--password`` is ignored.
* basic
.. code:: shell
swh deposit admin \
--config-file $SWH_CONFIG_FILENAME \
--platform production \
user create \
--collection \
--username \
--password
This adds a user ```` which can access the collection
````. The password will be used for checking the authentication access
to the deposit api (if 1. is used).
Note:
- If the collection does not exist, it is created alongside
- The password, if required, is passed as plain text but stored encrypted
Reschedule a deposit
---------------------
If for some reason, the loading failed, after fixing and deploying the new deposit
loader, you can reschedule the impacted deposit through:
.. code:: shell
swh deposit admin \
--config-file $SWH_CONFIG_FILENAME \
--platform production \
deposit reschedule \
--deposit-id
This will:
- check the deposit's status to something reasonable (failed or done). That means that
the checks have passed but something went wrong during the loading (failed: loading
failed, done: loading ok, still for some reasons as in bugs, we need to reschedule it)
- reset the deposit's status to 'verified' (prior to any loading but after the checks
which are fine) and removes the different archives' identifiers (swh-id, ...)
- trigger back the loading task through the scheduler
Integration checks
------------------
There exists icinga checks running periodically on `staging`_ and `production`_
instances. If any problem arises, expect those to notify the #swh-sysadm irc channel.
.. _staging: https://icinga.softwareheritage.org/search?q=deposit#!/monitoring/service/show?host=pergamon.softwareheritage.org&service=staging%20Check%20deposit%20end-to-end
.. _production: https://icinga.softwareheritage.org/search?q=deposit#!/monitoring/service/show?host=pergamon.softwareheritage.org&service=production%20Check%20deposit%20end-to-end
diff --git a/docs/specs/protocol-reference.rst b/docs/specs/protocol-reference.rst
index f38d0021..f02ea0fa 100644
--- a/docs/specs/protocol-reference.rst
+++ b/docs/specs/protocol-reference.rst
@@ -1,360 +1,362 @@
.. _deposit-protocol:
Protocol reference
==================
The swh-deposit protocol is an extension SWORDv2_ protocol, and the
swh-deposit client and server should work with any other SWORDv2-compliant
implementation which provides some :ref:`mandatory attributes `
However, we define some extensions by the means of extra tags in the Atom
entries, that should be used when interacting with the server to use it optimally.
This means the swh-deposit server should work with a generic SWORDv2 client, but
works much better with these extensions.
All these tags are in the ``https://www.softwareheritage.org/schema/2018/deposit``
XML namespace, denoted using the ``swhdeposit`` prefix in this section.
.. _deposit-create_origin:
Origin creation with the ```` tag
-----------------------------------------------------------
Motivation
^^^^^^^^^^
This is the main extension we define.
This tag is used after a deposit is completed, to load it in the Software Heritage
archive.
The SWH archive references source code repositories by an URI, called the
:term:`origin` URL.
This URI is clearly defined when SWH pulls source code from such a repository;
but not for the push approach used by SWORD, as SWORD clients do not intrinsically
have an URL.
Usage
^^^^^
Instead, clients are expected to provide the origin URL themselves, by adding
a tag in the Atom entry they submit to the server, like this:
.. code:: xml
This will create an origin in the Software Heritage archive, that will point to
the source code artifacts of this deposit.
Semantics of origin URLs
^^^^^^^^^^^^^^^^^^^^^^^^
Origin URLs must be unique to an origin, ie. to a software project.
The exact definition of a "software project" is left to the clients of the deposit.
They should be designed so that future releases of the same software will have
the same origin URL.
As a guideline, consider that every GitHub/GitLab project is an origin,
and every package in Debian/NPM/PyPI is also an origin.
While origin URLs are not required to resolve to a source code artifact,
we recommend they point to a public resource describing the software project,
including a link to download its source code.
This is not a technical requirement, but it improves discoverability.
+.. _swh-deposit-provider-url-definition:
+
Clients may not submit arbitrary URLs; the server will check the URLs they submit
belongs to a "namespace" they own, known as the ``provider_url`` of the client. For
example, if a client has their ``provider_url`` set to ``https://example.org/foo/`` they
will only be able to submit deposits to origins whose URL starts with
``https://example.org/foo/``.
Fallbacks
^^^^^^^^^
If the ```` is not provided (either because they are generic
SWORDv2 implementations or old implementations of an swh-deposit client), the server
falls back to creating one based on the ``provider_url`` and the ``Slug`` header
(as defined in the AtomPub_ specification) by concatenating them.
If the ``Slug`` header is missing, the server generates one randomly.
This fallback is provided for compliance with SWORDv2_ clients, but we do not
recommend relying on it, as it usually creates origins URL that are not meaningful.
.. _deposit-add_to_origin:
Adding releases to an origin, with the ```` tag
-------------------------------------------------------------------------
When depositing a source code artifact for an origin (ie. software project) that
was already deposited before, clients should not use ````,
as the origin was already created by the original deposit; and
```` should be used instead.
It is used very similarly to ````:
.. code:: xml
This will create a new :term:`revision` object in the Software Heritage archive,
with the last deposit on this origin as its parent revision,
and reference it from the origin.
If the origin does not exist, it will error.
Metadata
--------
Format
^^^^^^
While the SWORDv2 specification recommends the use of DublinCore_,
we prefer the CodeMeta_ vocabulary, as we already use it in other components
of Software Heritage.
While CodeMeta is designed for use in JSON-LD, it is easy to reuse its vocabulary
and embed it in an XML document, in three steps:
1. use the JSON-LD compact representation of the CodeMeta document
2. replace ``@context`` declarations with XML namespaces
3. unfold JSON lists to sibling XML subtrees
For example, this CodeMeta document:
.. code:: json
{
"@context": "https://doi.org/10.5063/SCHEMA/CODEMETA-2.0",
"name": "My Software",
"author": [
{
"name": "Author 1",
"email": "foo@example.org"
},
{
"name": "Author 2"
}
]
}
becomes this XML document:
.. code:: xml
My Software
Author 1
foo@example.org
Author 2
Or, equivalently:
.. code:: xml
My Software
Author 1
foo@example.org
Author 2
.. _mandatory-attributes:
Mandatory attributes
^^^^^^^^^^^^^^^^^^^^
All deposits must include:
* an ```` tag with an ```` and ````, and
* either ```` or ````
We also highly recommend their CodeMeta equivalent, and any other relevant
metadata, but this is not enforced.
.. _metadata-only-deposit:
Metadata-only deposit
---------------------
The swh-deposit server can also be without a source code artifact, but only
to provide metadata that describes an arbitrary origin or object in
Software Heritage; known as extrinsic metadata.
Unlike regular deposits, there are no restricting on URL prefixes,
so any client can provide metadata on any origin; and no restrictions on which
objects can be described.
This is done by simply omitting the binary file deposit request of
a regular SWORDv2 deposit, and including information on which object the metadata
describes, by adding a ```` tag in the Atom document.
To describe an origin:
.. code:: xml
And to describe an object:
.. code:: xml
For details on the semantics, see the
:ref:`metadata deposit specification `
.. _deposit-metadata-provenance:
Metadata provenance
-------------------
To indicate where the metadata is coming from, deposit clients can use a
```` element in ```` whose content is
the object the metadata is coming from,
preferably using the ``http://schema.org/`` namespace.
For example, when the metadata is coming from Wikidata, then the
```` should be the page of a Q-entity, such as
``https://www.wikidata.org/wiki/Q16988498`` (not the Q-entity
``http://www.wikidata.org/entity/Q16988498`` itself, as the Q-entity **is** the
object described in the metadata)
Or when the metadata is coming from a curated repository like HAL, then
```` should be the HAL project.
In particular, Software Heritage expects the ```` object
to have a ``http://schema.org/url`` property, so that it can appropriately link
to the original page.
For example, to deposit metadata on GNU Hello:
.. code:: xml
https://www.wikidata.org/wiki/Q16988498
Here is a more complete example of a metadata-only deposit on version 2.9 of GNU Hello,
to show the interaction with other fields,
.. code:: xml
https://www.wikidata.org/wiki/Q16988498
GNU Hello
http://www.wikidata.org/entity/Q16988498
https://www.gnu.org/software/hello/
http://www.wikidata.org/entity/Q7598
Schema
------
Here is an XML schema to summarize the syntax described in this document:
https://forge.softwareheritage.org/source/swh-deposit/browse/master/swh/deposit/xsd/swh.xsd
.. _SWORDv2: http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html
.. _AtomPub: https://tools.ietf.org/html/rfc5023
.. _DublinCore: https://www.dublincore.org/
.. _CodeMeta: https://codemeta.github.io/