diff --git a/docs/getting-started.rst b/docs/getting-started.rst index 8a1e6658..4d381077 100644 --- a/docs/getting-started.rst +++ b/docs/getting-started.rst @@ -1,342 +1,308 @@ Getting Started =============== -This is a getting started to demonstrate the deposit api use case with a -shell client. +This is a a short guide for how to prepare and push a software deposit with +the swh-deposit commands. The api is rooted at https://deposit.softwareheritage.org. For more details, see the `main documentation <./index.html>`__. Requirements ------------ You need to be referenced on SWH's client list to have: * a credential (needed for the basic authentication step) -* an associated collection +- in this document we reference ```` as the client's name and ```` + as its associated authentication password. +* an associated collection (by default the client's name is the collection +name) + `Contact us for more information. `__ -Demonstration -------------- - -For the rest of the document, we will: - -* reference ```` as the client and ```` as its associated - authentication password. -* use curl as example on how to request the api. -* present the main deposit use cases. - -The use cases are: - -* one single deposit step: The user posts in one query (one deposit) a software - source code archive and associated metadata (deposit is finalized with status - ``deposited``). - -This will demonstrate the multipart query. - -* another 3-steps deposit (which can be extended as more than 2 steps): - - 1. Create an incomplete deposit (status ``partial``) - 2. Update a deposit (and finalize it, so the status becomes ``deposited``) - 3. Check the deposit's state - -This will demonstrate the stateful nature of the sword protocol. +Prepare a deposit +----------------- +* compress the files in a supported archive formats: + - zip: common zip archive (no multi-disk zip files). + - tar: tar archive without compression or optionally any of the + following compression algorithm gzip (.tar.gz, .tgz), bzip2 + (.tar.bz2) , or lzma (.tar.lzma) +* prepare a metadata file with an atom xml entry (more details on +`metadata documentation <./metadata.html>`__.): + - specify metadata schema/vocabulry (CodeMeta is recommended) + - specify *MUST* metadata (url, authors, software name and + the external\_identifier) + - add all available information under the compatible metadadata term + Example of minimal atom entry file: + + .. code:: xml + + + + Je suis GPL + ext-id + forge.softwareheritage.org/source/jesuisgpl/ + Yes, this is another implementation of + "Hello, world!” when you run it. + + GPL + https://www.gnu.org/licenses/gpl.html + + + Reuben Thomas and Sami Kerola + Maintainers + + -Those use cases share a common part, they must start by requesting the -``service document iri`` (internationalized resource identifier) for -information about the collection's location. -Common part - Start with the service document -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Check authentication with a service document request +---------------------------------------------------- -First, to determine the *collection iri* onto which deposit data, the -client needs to ask the server where is its *collection* located. That -is the role of the *service document iri*. +Start with a simple request to check authentication credentials and the +*collection iri* onto which the deposit will be pushed . For example: .. code:: shell - curl -i --user : https://deposit.softwareheritage.org/1/servicedocument/ + curl -i --user : https://deposit.softwareheritage.org/1/servicedocument/ If everything went well, you should have received a response similar to this: .. code:: shell HTTP/1.0 200 OK Server: WSGIServer/0.2 CPython/3.5.3 Content-Type: application/xml 2.0 209715200 The Software Heritage (SWH) Archive Software Collection application/zip application/x-tar Collection Policy Software Heritage Archive Collect, Preserve, Share false http://purl.org/net/sword/package/SimpleZip https://deposit.softwareheritage.org/1// * ``HTTP/1.0 200 OK``: the query is successful and returns a body response * ``Content-Type: application/xml``: The body response is in xml format * body: it is a service document describing that the client ```` has a collection named ````. That collection is available at the *collection iri* ``/1//`` (through POST query). At this level, if something went wrong, this should be authentication related. So the response would have been a 401 Unauthorized access. Something like: .. code:: shell curl -i https://deposit.softwareheritage.org/1// HTTP/1.0 401 Unauthorized Server: WSGIServer/0.2 CPython/3.5.3 Content-Type: application/xml WWW-Authenticate: Basic realm="" X-Frame-Options: SAMEORIGIN Access to this api needs authentication processing failed + +Push deposit +------------ + +* one single deposit (archive + metadata): The user posts in one query (one deposit) a software + source code archive and associated metadata (deposit is finalized with status + ``deposited``). +* multi-part deposit + 1. Create an incomplete deposit (status ``partial``) + 2. Add data to a deposit (and finalize it, so the status becomes ``deposited``) + 3. Finalize deposit (can be done during second step) + Single deposit ~~~~~~~~~~~~~~ -A single deposit translates to a multipart deposit request. -This means, in swh's deposit's terms, sending exactly one POST query -with: +Once the files are ready for deposit, we want to do the actual deposit +in one shot, sending exactly one POST query with the prepared archive and +metadata file: * 1 archive (content-type ``application/zip`` or ``application/x-tar``) * 1 atom xml content (``content-type: application/atom+xml;type=entry``) -The supported archive, for now are limited to zip files. Those archives -are expected to contain some form of software source code. The atom -entry content is some xml defining metadata about that software. - -Example of minimal atom entry file: +For this, we need to provide: -.. code:: xml +* the arguments: --username 'name' --password 'pass' as credentials +* the name of the archive (example: 'path/to/archive-name.tgz') +* in the same location of the archive and with the following namimg pattern +for the metadata file: path/to/archive-name.metadata.xml +* optionally, the --slug 'your-id' argument, a reference to a unique identifier + the client uses for the software object. - - - Title - urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a - 2005-10-07T17:17:08Z - Contributor - The abstract - - - The abstract - Access Rights - Alternative Title - Date Available - Bibliographic Citation - Contributor - Description - Has Part - Has Version - Identifier - Is Part Of - Publisher - References - Rights Holder - Source - Title - Type - +You can do this with the following command: -Once the files are ready for deposit, we want to do the actual deposit -in one shot. +.. code:: shell -For this, we need to provide: +minimal deposit + $ swh-deposit --username 'name' --password 'pass' je-suis-gpl.tgz -* the contents and their associated correct content-types -* either the header ``In-Progress`` to false (meaning, it's finished after this - query) or nothing (the server will assume it's not in progress if not - present). -* Optionally, the ``Slug`` header, which is a reference to a unique identifier - the client knows about and wants to provide us. +with the client's identifier + $ swh-deposit --username 'name' --password 'pass' je-suis-gpl.tgz --sulg '123456' -You can do this with the following command: +deposit to a specific client's collection + $ swh-deposit --username 'name' --password 'pass' je-suis-gpl.tgz --collection 'second-collection' -.. code:: shell - curl -i --user : \ - -F "file=@deposit.zip;type=application/zip;filename=payload" \ - -F "atom=@atom-entry.xml;type=application/atom+xml;charset=UTF-8" \ - -H 'In-Progress: false' \ - -H 'Slug: some-external-id' \ - -XPOST https://deposit.softwareheritage.org/1// -You just posted a deposit to the collection -https://deposit.softwareheritage.org/1//. +You just posted a deposit to your collection on Software Heritage +https://deposit.softwareheritage.org/1//. If everything went well, you should have received a response similar to this: .. code:: shell HTTP/1.0 201 Created Server: WSGIServer/0.2 CPython/3.5.3 Location: /1//10/metadata/ Content-Type: application/xml 9 Sept. 26, 2017, 10:11 a.m. payload deposited http://purl.org/net/sword/package/SimpleZip * ``HTTP/1.0 201 Created``: the deposit is successful * ``Location: /1//10/metadata/``: the EDIT-SE-IRI through which we can update a deposit * body: it is a deposit receipt detailing all endpoints available to manipulate the deposit (update, replace, delete, etc...) It also explains the deposit identifier to be 9 (which is useful for the remaining example). Note: As the deposit is in ``deposited`` status, you cannot actually -update anything after this query. Well, the client can try, but it will -be answered with a 403 forbidden answer. +update anything after this query. It will be answered with a 403 forbidden answer. -Multi-steps deposit +Multi-part deposit ~~~~~~~~~~~~~~~~~~~ +The steps to create a multi-part deposit: -Create a deposit -^^^^^^^^^^^^^^^^ +Create an incomplete deposit +^^^^^^^^^^^^^^^^^^^^^^^^^^^ +First use the --partial argument to declare there is more to come -We will use the collection IRI again as the starting point. +.. code:: shell -We need to explicitely give to the server information about: + $ swh-deposit --username 'name' --password 'secret' --partial \ + foo.tar.gz -* the deposit's completeness (through header ``In-Progress`` to true, as we - want to do in multiple steps now). -* archive's md5 hash (through header ``Content-MD5``) -* upload's type (through the headers ``Content-Disposition`` and - ``Content-Type``) -The following command: +Add content or metadata to the deposit +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Continue the deposit by using the --deposit-id argument given as a response +for the first step. You can continue adding content or metadat while you use +the --partial argument. .. code:: shell - curl -i --user : \ - --data-binary @swh/deposit.tar.gz \ - -H 'In-Progress: true' \ - -H 'Content-MD5: 0faa1ecbf9224b9bf48a7c691b8c2b6f' \ - -H 'Content-Disposition: attachment; filename=[deposit.tar.gz]' \ - -H 'Slug: some-external-id' \ - -H 'Packaging: http://purl.org/net/sword/package/SimpleZIP' \ - -H 'Content-type: application/zip' \ - -XPOST https://deposit.softwareheritage.org/1// + $ swh-deposit --username 'name' --password 'secret' --partial \ + --deposit-id 42 add-foo.tar.gz -The expected answer is the same as the previous sample. -Update deposit's metadata -^^^^^^^^^^^^^^^^^^^^^^^^^ +Finalize deposit +^^^^^^^^^^^^^^^^^ +On your last addition, by not declaring it as --partial, the deposit will be +considered as completed and its status will be changed to ``deposited``. -To update a deposit, we can either add some more archives, some more -metadata or replace existing ones. +.. code:: shell +$ swh-deposit --username 'name' --password 'secret' \ + --deposit-id 42 last-foo.tar.gz -As we don't have defined metadata yet (except for the ``slug`` header), -we can add some to the ``EDIT-SE-IRI`` endpoint (/1//10/metadata/). That -information is extracted from the deposit receipt sample. -Using here the same atom-entry.xml file presented in previous chapter. +Update deposit +----------------- +* replace deposit : + - only possible if the deposit status is ``partial`` + - by using the --replace argument +.. code:: shell + $ swh-deposit --username 'name' --password 'secret' --replace\ + --deposit-id 11 updated-je-suis-gpl.tar.gz -For example, here is the command to update deposit metadata: +* update a loaded deposit with a new version + - by using the external-id slug which will link the new deposit + with its parent deposit .. code:: shell - curl -i --user : --data-binary @atom-entry.xml \ - -H 'In-Progress: true' \ - -H 'Slug: some-external-id' \ - -H 'Content-Type: application/atom+xml;type=entry' \ - -XPOST https://deposit.softwareheritage.org/1//10/metadata/ - HTTP/1.0 201 Created - Server: WSGIServer/0.2 CPython/3.5.3 - Location: /1//10/metadata/ - Content-Type: application/xml - - - 10 - Sept. 26, 2017, 10:32 a.m. - None - partial + $ swh-deposit --username 'name' --password 'pass' je-suis-gpl-v2.tgz --sulg '123456' - - - - - - - - - http://purl.org/net/sword/package/SimpleZip - Check the deposit's state ^^^^^^^^^^^^^^^^^^^^^^^^^ -You need to check the STATE-IRI endpoint (/1//10/status/). +You can check the status of the deposit with this request: .. code:: shell - curl -i --user : https://deposit.softwareheritage.org/1//10/status/ - HTTP/1.0 200 OK - Date: Wed, 27 Sep 2017 08:25:53 GMT - Content-Type: application/xml +$ swh-deposit --login 'name' --pass 'secret' --deposit-id '11' --status Response: .. code:: xml 9 deposited deposit is fully received and ready for loading +The different statuses: +- *partial* : multipart deposit is still ongoing +- *deposited*: deposit completed +- *rejected*: deposit failed the checks +- *verified*: content and metadata verified +- *loading*: loading in-progress +- *done*: loading completed successfully +- *failed*: the deposit loading has failed