diff --git a/docs/getting-started.md b/docs/getting-started.md index 319b8c1e..f7d3eab1 100644 --- a/docs/getting-started.md +++ b/docs/getting-started.md @@ -1,332 +1,332 @@ # Getting Started This is a getting started to demonstrate the deposit api use case with a shell client. The api is rooted at https://deposit.softwareheritage.org. -For more details, see the [main README](./README.md). +For more details, see the [main documentation](./index.html). ## Requirements You need to be referenced on SWH's client list to have: - a credential (needed for the basic authentication step). - an associated collection [Contact us for more information.](https://www.softwareheritage.org/contact/) ## Demonstration For the rest of the document, we will: - reference `` as the client and `` as its associated authentication password. - use curl as example on how to request the api. - present the main deposit use cases. The use cases are: - one single deposit step: The user posts in one query (one deposit) a software source code archive and associated metadata (deposit is finalized with status `ready-for-checks`). This will demonstrate the multipart query. - another 3-steps deposit (which can be extended as more than 2 steps): 1. Create an incomplete deposit (status `partial`) 2. Update a deposit (and finalize it, so the status becomes `ready-for-checks`) 3. Check the deposit's state This will demonstrate the stateful nature of the sword protocol. Those use cases share a common part, they must start by requesting the `service document iri` (internationalized resource identifier) for information about the collection's location. ### Common part - Start with the service document First, to determine the *collection iri* onto which deposit data, the client needs to ask the server where is its *collection* located. That is the role of the *service document iri*. For example: ``` Shell curl -i --user : https://deposit.softwareheritage.org/1/servicedocument/ ``` If everything went well, you should have received a response similar to this: ``` Shell HTTP/1.0 200 OK Server: WSGIServer/0.2 CPython/3.5.3 Content-Type: application/xml 2.0 209715200 The Software Heritage (SWH) Archive Software Collection application/zip Collection Policy Software Heritage Archive Collect, Preserve, Share false http://purl.org/net/sword/package/SimpleZip https://deposit.softwareheritage.org/1// ``` Explaining the response: - `HTTP/1.0 200 OK`: the query is successful and returns a body response - `Content-Type: application/xml`: The body response is in xml format - `body response`: it is a service document describing that the client `` has a collection named ``. That collection is available at the *collection iri* `/1//` (through POST query). At this level, if something went wrong, this should be authentication related. So the response would have been a 401 Unauthorized access. Something like: ``` Shell curl -i https://deposit.softwareheritage.org/1// HTTP/1.0 401 Unauthorized Server: WSGIServer/0.2 CPython/3.5.3 Content-Type: application/xml WWW-Authenticate: Basic realm="" X-Frame-Options: SAMEORIGIN Access to this api needs authentication processing failed ``` ### Single deposit A single deposit translates to a multipart deposit request. This means, in swh's deposit's terms, sending exactly one POST query with: - 1 archive (`content-type application/zip`) - 1 atom xml content (`content-type: application/atom+xml;type=entry`) The supported archive, for now are limited to zip files. Those archives are expected to contain some form of software source code. The atom entry content is some xml defining metadata about that software. Example of minimal atom entry file: ``` XML Title urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a 2005-10-07T17:17:08Z Contributor The abstract The abstract Access Rights Alternative Title Date Available Bibliographic Citation Contributor Description Has Part Has Version Identifier Is Part Of Publisher References Rights Holder Source Title Type ``` Once the files are ready for deposit, we want to do the actual deposit in one shot. For this, we need to provide: - the contents and their associated correct content-types - either the header `In-Progress` to false (meaning, it's finished after this query) or nothing (the server will assume it's not in progress if not present). - Optionally, the `Slug` header, which is a reference to a unique identifier the client knows about and wants to provide us. You can do this with the following command: ``` Shell curl -i --user : \ -F "file=@deposit.zip;type=application/zip;filename=payload" \ -F "atom=@atom-entry.xml;type=application/atom+xml;charset=UTF-8" \ -H 'In-Progress: false' \ -H 'Slug: some-external-id' \ -XPOST https://deposit.softwareheritage.org/1// ``` You just posted a deposit to the collection https://deposit.softwareheritage.org/1//. If everything went well, you should have received a response similar to this: ``` Shell HTTP/1.0 201 Created Server: WSGIServer/0.2 CPython/3.5.3 Location: /1//10/metadata/ Content-Type: application/xml 9 Sept. 26, 2017, 10:11 a.m. payload ready-for-checks http://purl.org/net/sword/package/SimpleZip ``` Explaining this response: - `HTTP/1.0 201 Created`: the deposit is successful - `Location: /1//10/metadata/`: the EDIT-SE-IRI through which we can update a deposit - body response: it is a deposit receipt detailing all endpoints available to manipulate the deposit (update, replace, delete, etc...) It also explains the deposit identifier to be 9 (which is useful for the remaining example). Note: As the deposit is in `ready-for-checks` status, you cannot actually update anything after this query. Well, the client can try, but it will be answered with a 403 forbidden answer. ### Multi-steps deposit -1. Create a deposit +#### Create a deposit We will use the collection IRI again as the starting point. We need to explicitely give to the server information about: - the deposit's completeness (through header `In-Progress` to true, as we want to do in multiple steps now). - archive's md5 hash (through header `Content-MD5`) - upload's type (through the headers `Content-Disposition` and `Content-Type`) The following command: ``` Shell curl -i --user : \ --data-binary @swh/deposit.zip \ -H 'In-Progress: true' \ -H 'Content-MD5: 0faa1ecbf9224b9bf48a7c691b8c2b6f' \ -H 'Content-Disposition: attachment; filename=[deposit.zip]' \ -H 'Slug: some-external-id' \ -H 'Packaging: http://purl.org/net/sword/package/SimpleZIP' \ -H 'Content-type: application/zip' \ -XPOST https://deposit.softwareheritage.org/1// ``` The expected answer is the same as the previous sample. -2. Update deposit's metadata +#### Update deposit's metadata To update a deposit, we can either add some more archives, some more metadata or replace existing ones. As we don't have defined metadata yet (except for the `slug` header), we can add some to the `EDIT-SE-IRI` endpoint (/1//10/metadata/). That information is extracted from the deposit receipt sample. Using here the same atom-entry.xml file presented in previous chapter. For example, here is the command to update deposit metadata: ``` Shell curl -i --user : --data-binary @atom-entry.xml \ -H 'In-Progress: true' \ -H 'Slug: some-external-id' \ -H 'Content-Type: application/atom+xml;type=entry' \ -XPOST https://deposit.softwareheritage.org/1//10/metadata/ HTTP/1.0 201 Created Server: WSGIServer/0.2 CPython/3.5.3 Location: /1//10/metadata/ Content-Type: application/xml 10 Sept. 26, 2017, 10:32 a.m. None partial http://purl.org/net/sword/package/SimpleZip ``` -3. Check the deposit's state +#### Check the deposit's state You need to check the STATE-IRI endpoint (/1//10/status/). ``` Shell curl -i --user : https://deposit.softwareheritage.org/1//10/status/ HTTP/1.0 200 OK Date: Wed, 27 Sep 2017 08:25:53 GMT Content-Type: application/xml ``` Response: ``` XML 9 ready-for-checks deposit is fully received and ready for loading ```