diff --git a/README b/README
--- a/README
+++ b/README
@@ -1,5 +1,615 @@
-swh-deposit
-===========
+swh-deposit (draft)
+===================
-SWH's SWORD Deposit Server
+This is SWH's SWORD Server implementation.
+SWORD (Simple Web-Service Offering Repository Deposit) is an
+interoperability standard for digital file deposit.
+
+This protocol will be used to interact between a client (a repository)
+and a server (swh repository) to permit the deposit of software
+tarballs.
+
+In this document, we will refer to a client (e.g. HAL server) and a
+server (SWH's).
+
+Table of contents
+---------------------
+1. [use cases](#uc)
+2. [api overview](#api)
+3. [limitations](#limitations)
+4. [scenarios](#scenarios)
+5. [errors](#errors)
+6. [tarball injection](#tarball)
+7. [technical](#technical)
+8. [sources](#sources)
+
+# Use cases
+
+## First deposit
+
+From client's deposit repository server to SWH's repository server
+(aka deposit).
+
+-[\[1\]](#1) The client requests for the server's abilities.
+ (GET query to the *service document uri*)
+
+-[\[2\]](#2)The server answers the client with the service document
+
+-[\[3\]](#3) The client sends the deposit (an archive -> .zip, .tar.gz)
+through the deposit *creation uri*.
+ (one or more POST requests since the archive and metadata can be sent
+ in multiple requests)
+
+
+-[\[4\]](#4) The server notifies the client it acknowledged the
+client's request. ('http 201 Created' with a deposit receipt id in
+the Location header of the response)
+
+
+## Updating an existing archive
+
+-[\[5\]](#5) Client updates existing archive through the deposit *update uri*
+ (one or more PUT requests, in effect chunking the artifact to deposit)
+
+## Deleting an existing archive
+
+-[\[6\]](#6) Document deletion will not be implemented,
+cf. limitation paragraph for detail
+
+## Client asks for operation status and repository id
+
+-[\[7\]](#7) TODO: Detail this when clear
+
+# API overview
+
+API access is over HTTPS.
+
+service document accessible at:
+https://archive.softwareheritage.org/api/1/servicedocument/
+
+API endpoints:
+
+ - without a specific collection, are rooted at
+ https://archive.softwareheritage.org/api/1/deposit/.
+
+ - with a specific and unique collection dubbed 'software', are rooted at
+ https://archive.softwareheritage.org/api/1/software/.
+
+
+TODO: Determine which one of those solutions according to sword possibilities
+(cf. 'unclear points' chapter below)
+
+# Limitations
+
+Applying the SWORD protocol procedure will result with voluntary implementation
+shortcomings during the first iteration:
+
+- upload limitation of 200Mib
+- only tarballs (.zip, .tar.gz) will be accepted
+- no removal (implementation-wise, this will possibly be a means
+ to hide the origin).
+- no mediation (we do not know the other system's users)
+- basic http authentication enforced at the application layer
+ on a per client basis (authentication:
+ http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html#authenticationmediateddeposit)
+
+## unclear points
+
+- SWORD defines a 'collection' concpet. should we apply the 'collection' concept
+ even thought SWH is software archive having one 'software' collection?
+ - option A:
+ The collection refers to a group of documents to which the document sent
+ (aka deposit) is part of. In this process with HAL, HAL is the collection,
+ maybe tomorrow we will do the same with MIT and MIT could be
+ the collection (the logic of the answer above is a result of this
+ link: https://hal.inria.fr/USPC for the USPC collection)
+
+ **result**: 1 client being equivalent as 1 collection in this case.
+ The is client pushes us software in 'their' one collection.
+ The collection name could show up in the uri endpoint.
+
+ - option B:
+ Define none? (is it possible? i don't think it is due to the service
+ document part listing the collection to act upon...)
+
+ **result**: the deposited software has no other entry point via
+ collection name
+
+
+## Scenarios
+### [1] Client request for Service Document
+
+This is the endpoint permitting the client to ask the server's abilities.
+
+
+#### API endpoint
+
+GET api/1/servicedocument/
+
+Answer:
+- 200, Content-Type: application/atomserv+xml: OK, with the body
+ described below
+
+#### Sample request:
+
+``` shell
+GET https://archive.softwareheritage.org/api/1/servicedocument HTTP/1.1
+Host: archive.softwareheritage.org
+```
+
+### [2] Sever respond for Service Document
+
+The server returns its abilities with the service document in xml format:
+- protocol sword version v2
+- accepted mime types: application/zip, application/gzip
+- upload max size accepted, beyond that, it's expected the client
+ chunk the tarball into multiple ones
+- the collections the client can act upon (swh supports only one software collection)
+- mediation not supported
+
+#### Sample answer:
+``` xml
+
+
+
+ 2.0
+ ${max_upload_size}
+
+
+ The SWH archive
+
+
+ SWH Collection
+ application/gzip
+ application/gzip
+ Software Heritage Archive Deposit
+ false
+ http://purl.org/net/sword/package/SimpleZip
+
+
+
+```
+
+
+## Deposit Creation: client point of view
+
+Process of deposit creation:
+
+-> [3] client request
+
+ - [3.1] server validation
+ - [3.2] server temporary upload
+ - [3.3] server injects deposit into archive*
+
+<- [4] server returns deposit receipt id
+
+
+*[3.3] Asynchronously, the server will inject the archive uploaded and the
+ associated metadata. The operation status mentioned
+ earlier is a reference to that injection operation.
+
+The image bellow represent only the communication and creation of
+a deposit:
+{F2403754}
+
+### [3] client request
+
+The client can send a deposit through one request deposit or multiple requests deposit.
+
+The deposit can contain:
+- an archive holding the software source code,
+- an envelop with metadata describing information regarding a deposit,
+- or both (Multipart deposit).
+
+the client can deposit a binary file, supplying the following headers:
+- Content-Type (text): accepted mimetype
+- Content-Length (int): tarball size
+- Content-MD5 (text): md5 checksum hex encoded of the tarball
+- Content-Disposition (text): attachment; filename=[filename] ; the filename
+ parameter must be text (ascii)
+- Packaging (IRI): http://purl.org/net/sword/package/SimpleZip
+- In-Progress (bool): true to specify it's not the last request, false
+ to specify it's a final request and the server can go on with
+ processing the request's information.
+
+if In-Progress is not present the server MUST assume that it is false
+
+#### API endpoint
+
+POST /api/1/deposit/
+
+#### One request deposit
+
+The one request deposit is a single request containing both the metadata (body)
+and the archive (attachment).
+
+A Multipart deposit is a request of an archive along with metadata about
+that archive (can be applied in a one request deposit or multiple requests).
+
+Client provides:
+- Content-Disposition (text): header of type 'attachment' on the Entry
+ Part with a name parameter set to 'atom'
+- Content-Disposition (text): header of type 'attachment' on the Media
+ Part with a name parameter set to payload and a filename parameter
+ (the filename will be expressed in ASCII).
+- Content-MD5 (text): md5 checksum hex encoded of the tarball
+- Packaging (text): http://purl.org/net/sword/package/SimpleZip
+ (packaging format used on the Media Part)
+- In-Progress (bool): true|false; true means partial upload and we can expect
+ other requests in the future, false means the deposit is done.
+- add metadata formats or foreign markup to the atom:entry element
+
+
+#### sample request for multipart deposit:
+
+``` xml
+POST deposit HTTP/1.1
+Host: archive.softwareheritage.org
+Content-Length: [content length]
+Content-Type: multipart/related;
+ boundary="===============1605871705==";
+ type="application/atom+xml"
+In-Progress: false
+MIME-Version: 1.0
+
+Media Post
+--===============1605871705==
+Content-Type: application/atom+xml; charset="utf-8"
+Content-Disposition: attachment; name="atom"
+MIME-Version: 1.0
+
+
+
+ Title
+ hal-or-other-archive-id
+ 2005-10-07T17:17:08Z
+ Contributor
+
+
+
+
+--===============1605871705==
+Content-Type: application/zip
+Content-Disposition: attachment; name=payload; filename=[filename]
+Packaging: http://purl.org/net/sword/package/SimpleZip
+Content-MD5: [md5-digest]
+MIME-Version: 1.0
+
+[...binary package data...]
+--===============1605871705==--
+```
+
+## Deposit Creation - server point of view
+
+The server receives the request and:
+
+### [3.1] Validation of the header and body request
+
+
+### [3.2] Server uploads the content in a temporary location
+(deposit table in a separated DB).
+- saves the archives in a temporary location
+- executes a md5 checksum on that archive and check it against the
+ same header information
+- adds a deposit entry and retrieves the associated id
+
+
+### [4] Servers answers the client
+an 'http 201 Created' with a deposit receipt id in the Location header of
+the response.
+
+##### The server possible answers are:
+- OK: '201 created' + one header 'Location' holding the deposit receipt
+ id
+- KO: with the error status code and associated message
+ (cf. [possible errors paragraph](#possible errors)).
+
+
+### [5] Deposit Update
+
+The client previously uploaded an archive and wants to add either new
+metadata information or a new version for that previous deposit
+(possibly in multiple steps as well). The important thing to note
+here is that for swh, this will result in a new version of the
+previous deposit in any case.
+
+Providing the identifier of the previous version deposit received from
+the status URI, the client executes a PUT request on the same URI as
+the deposit one.
+
+After validation of the body request, the server:
+- uploads such content in a temporary location (to be defined).
+
+- answers the client an 'http 204 (No content)'. In the Location
+ header of the response lies a deposit receipt id permitting the
+ client to check back the operation status later on.
+
+- Asynchronously, the server will inject the archive uploaded and the
+ associated metadata. The operation status mentioned earlier is a
+ reference to that injection operation. The fact that the version is
+ a new one is dealt with at the injection level.
+
+##### URL: PUT /1/deposit/
+
+## [6] Deposit Removal
+
+[#limitation](As explained in the limitation paragraph), removal won't
+be implemented. Nothing is removed from the SWH archive.
+
+The server answers a '405 Method not allowed' error.
+
+### [7] Operation Status
+
+Providing a deposit receipt id, the client asks the operation status
+of a prior upload.
+
+URL: GET /1/collection/{deposit_receipt}
+
+or GET /1/deposit/{deposit_receipt}
+
+note: depends of the decision taken about collections
+
+## Possible errors
+
+### sword:ErrorContent
+
+IRI: http://purl.org/net/sword/error/ErrorContent
+
+The supplied format is not the same as that identified in the
+Packaging header and/or that supported by the server Associated HTTP
+
+Status: 415 (Unsupported Media Type) or 406 (Not Acceptable)
+
+### sword:ErrorChecksumMismatch
+
+IRI: http://purl.org/net/sword/error/ErrorChecksumMismatch
+
+Checksum sent does not match the calculated checksum. The server MUST
+also return a status code of 412 Precondition Failed
+
+### sword:ErrorBadRequest
+
+IRI: http://purl.org/net/sword/error/ErrorBadRequest
+
+Some parameters sent with the POST/PUT were not understood. The server
+MUST also return a status code of 400 Bad Request.
+
+### sword:MediationNotAllowed
+
+IRI: http://purl.org/net/sword/error/MediationNotAllowed
+
+Used where a client has attempted a mediated deposit, but this is not
+supported by the server. The server MUST also return a status code of
+412 Precondition Failed.
+
+### sword:MethodNotAllowed
+
+IRI: http://purl.org/net/sword/error/MethodNotAllowed
+
+Used when the client has attempted one of the HTTP update verbs (POST,
+PUT, DELETE) but the server has decided not to respond to such
+requests on the specified resource at that time. The server MUST also
+return a status code of 405 Method Not Allowed
+
+### sword:MaxUploadSizeExceeded
+
+IRI: http://purl.org/net/sword/error/MaxUploadSizeExceeded
+
+Used when the client has attempted to supply to the server a file
+which exceeds the server's maximum upload size limit
+
+Associated HTTP Status: 413 (Request Entity Too Large)
+
+---------------
+
+# Tarball Injection
+
+Providing we use indeed synthetic revision to represent a version of a
+tarball injected through the sword use case, this needs to be improved
+so that the synthetic revision is created with a parent revision (the
+previous known one for the same 'origin').
+
+
+### Injection mapping
+| origin | https://hal.inria.fr/hal-id |
+|-------------------------------------|---------------------------------------|
+| origin_visit | 1 :reception_date |
+| occurrence & occurrence_history | branch: client's version n° (e.g hal) |
+| revision | synthetic_revision (tarball) |
+| directory | upper level of the uncompressed archive|
+
+
+##### Questions raised concerning injection:
+- A deposit has one origin, yet an origin can have multiple deposits ?
+
+No, an origin can have multiple requests for the same deposit,
+which should end up in one single deposit (when the client pushes its final
+request saying deposit 'done' through the header In-Progress).
+
+When an update of a deposit is requested,
+the new version is identified with the external_id.
+
+Illustration First deposit injection:
+
+HAL's deposit 01535619 = SWH's deposit **01535619-1**
+
+ + 1 origin with url:https://hal.inria.fr/medihal-01535619
+
+ + 1 synthetic revision
+
+ + 1 directory
+
+HAL's update on deposit 01535619 = SWH's deposit **01535619-2**
+
+(*with HAL updates can only be on the metadata and a new version is required
+if the content changes)
+ + 1 origin with url:https://hal.inria.fr/medihal-01535619
+
+ + new synthetic revision (with new metadata)
+
+ + same directory
+
+HAL's deposit 01535619-v2 = SWH's deposit **01535619-v2-1**
+
+ + same origin
+
+ + new revision
+
+ + new directory
+
+
+
+## Technical detail
+We will need:
+- one dedicated db to store state - swh-deposit
+
+- one dedicated temporary storage to store archives before injection
+
+- one client to test the communication with SWORD protocol
+
+### Deposit reception schema
+
+- **deposit** table:
+ - id (bigint): deposit receipt id
+
+ - external id (text): client's internal identifier (e.g hal's id, etc...).
+
+ - origin id : null before injection
+ - swh_id : swh identifier result once the injection is complete
+
+ - reception_date: first deposit date
+
+ - complete_date: reception date of the last deposit which makes the deposit
+ complete
+
+ - status (enum):
+```
+ 'partial', -- the deposit is new or partially received since it
+ -- can be done in multiple requests
+ 'expired', -- deposit has been there too long and is now deemed
+ -- ready to be garbage collected
+ 'ready', -- deposit is fully received and ready for injection
+ 'scheduled', -- injection is scheduled on swh's side
+ 'success', -- injection successful
+ 'failure' -- injection failure
+```
+- **deposit_request** table:
+ - id (bigint): identifier
+ - deposit_id: deposit concerned by the request
+ - metadata: metadata associated to the request
+
+- **client** table:
+ - id (bigint): identifier
+ - name (text): client's name (e.g HAL)
+ - credentials
+
+
+All metadata (declared metadata) are stored in deposit_request (with the
+request they were sent with).
+When the deposit is complete metadata fields are aggregated and sent
+to injection. During injection the metadata is kept in the
+origin_metadata table (see [metadata injection](#metadata-injection)).
+
+The only update actions occurring on the deposit table are in regards of:
+ - status changing
+ - partial -> {expired/ready},
+ - ready -> scheduled,
+ - scheduled -> {success/failure}
+ - complete_date when the deposit is finalized
+ (when the status is changed to ready)
+ - swh-id being populated once we have the result of the injection
+
+#### SWH Identifier returned?
+
+ swh--
+
+ e.g: swh-hal-47dc6b4636c7f6cba0df83e3d5490bf4334d987e
+
+ We could have a specific dedicated 'client' table to reference client
+ identifier.
+
+### Scheduling injection
+All data and metadata separated with multiple requests should be aggregated
+before injection.
+
+TODO: injection modeling
+
+### Metadata injection
+- the metadata received with the deposit should be kept in the origin_metadata
+table before translation as part of the injection process and a indexation
+process should be scheduled.
+
+origin_metadata table:
+```
+origin bigint PK FK
+date date PK FK
+provenance_type text
+ // (enum: 'publisher', 'lister' needs to be completed)
+raw_metadata jsonb
+ // before translation
+indexer_configuration_id bigint FK
+ // tool used for translation
+translated_metadata jsonb
+ // with codemeta schema and terms
+```
+
+# Nomenclature
+
+SWORD uses IRI. This means Internationalized Resource Identifier. In
+this chapter, we will describe SWH's IRI.
+
+## SD-IRI - The Service Document IRI
+
+This is the IRI from which the root service document can be
+located.
+
+## Col-IRI - The Collection IRI
+
+Only one collection of software is used in this repository.
+
+Note:
+This is the IRI to which the initial deposit will take place, and
+which are listed in the Service Document.
+Discuss to check if we want to implement this or not.
+
+## Cont-IRI - The Content IRI
+
+This is the IRI from which the client will be able to retrieve
+representations of the object as it resides in the SWORD server.
+
+## EM-IRI - The Atom Edit Media IRI
+
+To simplify, this is the same as the Cont-IRI.
+
+## Edit-IRI - The Atom Entry Edit IRI
+
+This is the IRI of the Atom Entry of the object, and therefore also of
+the container within the SWORD server.
+
+## SE-IRI - The SWORD Edit IRI
+
+This is the IRI to which clients may POST additional content to an
+Atom Entry Resource. This MAY be the same as the Edit-IRI, but is
+defined separately as it supports HTTP POST explicitly while the
+Edit-IRI is defined by [AtomPub] as limited to GET, PUT and DELETE
+operations.
+
+## State-IRI - The SWORD Statement IRI
+
+This is the one of the IRIs which can be used to retrieve a
+description of the object from the sword server, including the
+structure of the object and its state. This will be used as the
+operation status endpoint.
+
+# sources
+
+- [SWORD v2 specification](http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html)
+- [arxiv documentation](https://arxiv.org/help/submit_sword)
+- [Dataverse example](http://guides.dataverse.org/en/4.3/api/sword.html)
+- [SWORD used on HAL](https://api.archives-ouvertes.fr/docs/sword)
+- [xml examples for CCSD](https://github.com/CCSDForge/HAL/tree/master/Sword)