diff --git a/doc/specs.md b/doc/specs.md
index b84b9c49..4085519e 100644
--- a/doc/specs.md
+++ b/doc/specs.md
@@ -1,383 +1,383 @@
swh-sword (draft)
=================
SWORD (Simple Web-Service Offering Repository Deposit) is an
interoperability standard for digital file deposit.
This protocol will be used to interact between a client (a repository)
and a server (swh repository) to permit the deposit of software
tarballs.
In this document, we will refer to a client (e.g. HAL server) and a
server (SWH's).
To summarize, the SWORD protocol exchange, from repository to
repository scenario is:
- Discussion with the client and the server to establish the server's
abilities (GET). The client can ask the server's abilities through a
GET query to the service document uri. The server answers to the
client describing but not limited to, the sword version supported
(v2), the max upload size it expects, a URI list of supported
endpoints, the collection it can query, etc...
- Client deposits one document version (archive) through the deposit
creation uri (one or more POST request, in effect chunking the
artifact to deposit)
- Client updates existing document version (archive) through the
deposit update uri via (one or more PUT requests, in effect chunking
the artifact to deposit)
- Client deletes a document through the delete uri via a DELETE
request (cf. limitation paragraph about this one)
- Client can list collections' documents (let's not?).
Note:
IRI: Internationalized Resource identifier
# API
API access is over HTTPS.
All API endpoints are rooted at https://archive.softwareheritage.org/deposit/.
# Limitation
In the current state, there will be some voluntary shortcomings in the
implementation, notably:
- no removal
- no mediation (we do not know the other system's users)
- upload limitation of 100Mib
- only tarballs (.zip, .tar.gz) will be accepted
- no authentication or a simple one not dealt with at the application
layer
- SWORD defines a collection notion. As SWH is a software archive, we
will define only one collection or none if possible.
# Service Document
This is a step permitting the client to determine the server's abilities.
The server responds its abilities:
- protocol sword version v2
- accepted mime types: application/zip, application/gzip
- upload max size accepted, beyond that, it's expected the client
chunk the tarball into multiple ones
- the collections the client can act upon (the only collection
'software')
- mediation not supported
## API
GET /1/servicedocument/
Answer:
- 200, Content-Type: application/atomserv+xml: OK, with the body
described below
## Sample
``` shell
GET https://archive.softwareheritage.org/1/servicedocument HTTP/1.1
Host: archive.softwareheritage.org
```
Server answers:
``` xml
2.0
${max_upload_size}
The SWH archive
SWH Collection
application/gzip
application/gzip
Software Heritage Archive
false
http://purl.org/net/sword/package/SimpleZip
```
# Deposit Creation
The client posts (in possibly multiple requests):
- only an archive holding the software source code.
- only an envelop with metadata (to be defined) describing information
on an (already or not yet) uploaded archive
- both
After validation of the header and body request, the server:
- uploads such content in a temporary location (to be defined).
- answers the client an 'http 201 Created'. In the Location header of
the response lies a deposit receipt id permitting the client to
check back the operation status later on.
- Asynchronously, the server will inject the archive uploaded and the
associated metadata (swh-loader-tar). The operation status mentioned
earlier is a reference to that injection operation.
## Mono deposit
This describes the posting of an archive (in possibly multiple
requests).
### client
In one or multiple requests, the client can deposit a binary file,
supplying the following headers:
- Content-Type (text): accepted mimetype
- Content-Length (int):
- Content-MD5 (text): md5 checksum hex encoded of the tarball
- Content-Disposition (text): attachment; filename=[filename] ; the filename
parameter must be text (ascii)
- Packaging (IRI): http://purl.org/net/sword/package/SimpleZip
- In-Progress (bool): true to specify it's not the last request, false
to specify it's a final request and the server can go on with
processing the request's information
Example:
```
POST Col-IRI HTTP/1.1
Host: archive.softwareheritage.org
Content-Type: application/zip
Content-Length: [content length]
Content-MD5: [md5-digest]
Content-Disposition: attachment; filename=[filename]
Packaging: http://purl.org/net/sword/package/METSDSpaceSIP
In-Progress: true|false
[request entity]
```
POST /1/software/
### server
The server receives the request and:
- saves the archives in a temporary location
- executes a md5 checksum on that archive and check it against the
same header information
- adds a deposit entry and retrieves the associated id
The server answers either:
- OK: 201 created with one header 'Location' with the deposit receipt
id
- KO: with the error status code and associated message
(cf. [possible errors paragraph](#possible errors)).
## Multipart deposit
This describes the posting of an archive along with metadata about
that archive (in possibly multiple requests).
Client provides:
- Content-Disposition (text): header of type 'attachment' on the Entry
Part with a name parameter set to 'atom'
- Content-Disposition (text): header of type 'attachment' on the Media
Part with a name parameter set to payload and a filename parameter
[SWORD004] (the filename will be expressed in ASCII).
- Content-MD5 (text): md5 checksum hex encoded of the tarball
- Packaging (text): http://purl.org/net/sword/package/SimpleZip
(packaging format used on the Media Part)
-- MAY provide an In-Progress header with a value of true or false
- on the main HTTP header
+- In-Progress (bool): true|false
- add metadata formats or foreign markup to the atom:entry element (TO
BE DEFINED)
-Example:
+## Example
``` xml
POST deposit HTTP/1.1
Host: archive.softwareheritage.org
Content-Length: [content length]
Content-Type: multipart/related;
boundary="===============1605871705==";
type="application/atom+xml"
In-Progress: false
MIME-Version: 1.0
Media Post
--===============1605871705==
Content-Type: application/atom+xml; charset="utf-8"
Content-Disposition: attachment; name="atom"
MIME-Version: 1.0
Title
hal-or-other-archive-id
2005-10-07T17:17:08Z
Contributor
--===============1605871705==
Content-Type: application/zip
Content-Disposition: attachment; name=payload; filename=[filename]
Packaging: http://purl.org/net/sword/package/SimpleZip
Content-MD5: [md5-digest]
MIME-Version: 1.0
[...binary package data...]
--===============1605871705==--
```
## API
-POST /1/software/
+POST /1/deposit/
Answers:
- OK: 201 created + 'Location' header with the deposit receipt id
- KO: any errors mentioned in the [possible errors paragraph](#possible errors).
-## Sample
-
-TODO
-
# Deposit Update
-The client previously uploaded an archive and wants to add a new
-version (possibly in multiple steps as well). Providing the identifier
-of the previous version deposit received from the status URI, the
-client executes a PUT request on the same URI as the deposit one.
+The client previously uploaded an archive and wants to add either new
+metadata information or a new version for that previous deposit
+(possibly in multiple steps as well). The important thing to note
+here is that for swh, this will result in a new version of the
+previous deposit in any case.
+
+Providing the identifier of the previous version deposit received from
+the status URI, the client executes a PUT request on the same URI as
+the deposit one.
After validation of the body request, the server:
- uploads such content in a temporary location (to be defined).
-- answers the client an 'http 201 Created'. In the Location header of
- the response lies a deposit receipt id permitting the client to
- check back the operation status later on.
+- answers the client an 'http 204 (No content)'. In the Location
+ header of the response lies a deposit receipt id permitting the
+ client to check back the operation status later on.
- Asynchronously, the server will inject the archive uploaded and the
- associated metadata (swh-loader-tar). The operation status mentioned
- earlier is a reference to that injection operation. The fact that
- the version is a new one is up to the tarball injection.
+ associated metadata. The operation status mentioned earlier is a
+ reference to that injection operation. The fact that the version is
+ a new one is up to the tarball injection.
-URL: PUT /1/software/
+URL: PUT /1/deposit/
# Deposit Removal
[#limitation](As explained in the limitation paragraph), removal won't
be implemented. Nothing is removed from the SWH archive.
The server answers a '405 Method not allowed' error.
# Operation Status
Providing a deposit receipt id, the client asks the operation status
of a prior upload.
URL: GET /1/software/{deposit_receipt}
# Possible errors
## sword:ErrorContent
IRI: http://purl.org/net/sword/error/ErrorContent
The supplied format is not the same as that identified in the
Packaging header and/or that supported by the server Associated HTTP
Status: 415 (Unsupported Media Type) or 406 (Not Acceptable)
## sword:ErrorChecksumMismatch
IRI: http://purl.org/net/sword/error/ErrorChecksumMismatch
Checksum sent does not match the calculated checksum. The server MUST
also return a status code of 412 Precondition Failed
## sword:ErrorBadRequest
IRI: http://purl.org/net/sword/error/ErrorBadRequest
Some parameters sent with the POST/PUT were not understood. The server
MUST also return a status code of 400 Bad Request.
## sword:MediationNotAllowed
IRI: http://purl.org/net/sword/error/MediationNotAllowed
Used where a client has attempted a mediated deposit, but this is not
supported by the server. The server MUST also return a status code of
412 Precondition Failed.
## sword:MethodNotAllowed
IRI: http://purl.org/net/sword/error/MethodNotAllowed
Used when the client has attempted one of the HTTP update verbs (POST,
PUT, DELETE) but the server has decided not to respond to such
requests on the specified resource at that time. The server MUST also
return a status code of 405 Method Not Allowed
## sword:MaxUploadSizeExceeded
IRI: http://purl.org/net/sword/error/MaxUploadSizeExceeded
Used when the client has attempted to supply to the server a file
which exceeds the server's maximum upload size limit
Associated HTTP Status: 413 (Request Entity Too Large)
----------------------------------------------------------------------
# Tarball Injection
Providing we use indeed synthetic revision to represent a version of a
tarball injected through the sword use case, this needs to be improved
so that the synthetic revision is created with a parent revision (the
previous known one for the same 'origin').
Note:
- origin may no longer be the right term (we may need a new 'at the
same level' notion, maybe 'deposit'?)
- As there are no authentication, everyone can push a new version for
the same tarball so we might need to use the synthetic revision's
author (or committer?) date to discriminate which is the last known
version for the same 'origin'.
# Technical
We will need:
- one dedicated db to store state - swh-sword
- one dedicated temporary storage to store archives
- 'deposit' table:
- id (bigint); deposit receipt id
- external id (text):
- date: date of the full deposit is done
- status (enum): received, ongoing, partial, full
# source
- [http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html](SWORD v2 specification).
- [https://arxiv.org/help/submit_sword](arxiv documentation)
- [http://guides.dataverse.org/en/4.3/api/sword.html]()