diff --git a/docs/dev-info.rst b/docs/dev-info.rst index 1d613d0c..9405120f 100644 --- a/docs/dev-info.rst +++ b/docs/dev-info.rst @@ -1,176 +1,178 @@ +.. _swh-deposit-dev: + Hacking on swh-deposit ====================== There are multiple modes to run and test the server locally: * development-like (automatic reloading when code changes) * production-like (no reloading) * integration tests (no side effects) Except for the tests which are mostly side effects free (except for the database access), the other modes will need some configuration files (up to 2) to run properly. Database -------- swh-deposit uses a database to store the state of a deposit. The default db is expected to be called swh-deposit-dev. To simplify the use, the following makefile targets can be used: schema ~~~~~~ .. code:: shell make db-create db-prepare db-migrate data ~~~~ Once the db is created, you need some data to be injected (request types, client, collection, etc...): .. code:: shell make db-load-data db-load-private-data The private data are about having a user (``hal``) with a password (``hal``) who can access a collection (``hal``). Add the following to ``../private-data.yaml``: .. code:: yaml - model: deposit.depositclient fields: user_ptr_id: 1 collections: - 1 - model: auth.User pk: 1 fields: first_name: hal last_name: hal username: hal password: "pbkdf2_sha256$30000$8lxjoGc9PiBm$DO22vPUJCTM17zYogBgBg5zr/97lH4pw10Mqwh85yUM=" - model: deposit.depositclient fields: user_ptr_id: 1 collections: - 1 url: https://hal.inria.fr drop ~~~~ For information, you can drop the db: .. code:: shell make db-drop Development-like environment ---------------------------- Development-like environment needs one configuration file to work properly. Configuration ~~~~~~~~~~~~~ **``{/etc/softwareheritage | ~/.config/swh | ~/.swh}``/deposit/server.yml**: .. code:: yaml # dev option for running the server locally host: 127.0.0.1 port: 5006 # production authentication: activated: true white-list: GET: - / # 20 Mib max size max_upload_size: 20971520 Run ~~~ Run the local server, using the default configuration file: .. code:: shell make run-dev Production-like environment --------------------------- Production-like environment needs additional section in the configuration file to work properly. This is more close to what's actually running in production. Configuration ~~~~~~~~~~~~~ This expects the same file describes in the previous chapter. Plus, an additional private section file containing private information that is not in the source code repository. **``{/etc/softwareheritage | ~/.config/swh | ~/.swh}``/deposit/private.yml**: .. code:: yaml private: secret_key: production-local db: name: swh-deposit-dev A production configuration file would look like: .. code:: yaml private: secret_key: production-secret-key db: name: swh-deposit-dev host: db port: 5467 user: user password: user-password Run ~~~ .. code:: shell make run Note: This expects gunicorn3 package installed on the system Tests ----- To run the tests: .. code:: shell make test As explained, those tests are mostly side-effect free. The db part is dealt with by django. The remaining part which patches those side-effect behavior is dealt with in the ``swh/deposit/tests/__init__.py`` module. Sum up ------ Prepare everything for your user to run: .. code:: shell make db-drop db-create db-prepare db-migrate db-load-private-data run-dev diff --git a/docs/endpoints/collection.rst b/docs/endpoints/collection.rst index 0513fe5c..50d3b2fb 100644 --- a/docs/endpoints/collection.rst +++ b/docs/endpoints/collection.rst @@ -1,82 +1,83 @@ .. _API-create-deposit: + Create deposit ^^^^^^^^^^^^^^^ .. http:post:: /1/(str:collection-name)/ Create deposit in a collection which name is `collection-name`. The client sends a deposit request to a specific collection with: * an archive holding the software source code (binary upload) * an envelop with metadata describing information regarding a deposit (atom entry deposit) Also known as: COL-IRI **Example query**: .. code:: shell curl -i -u hal: \ -F "file=@deposit.json;type=application/zip;filename=payload" \ -F "atom=@atom-entry.xml;type=application/atom+xml;charset=UTF-8" \ -H 'In-Progress: false' \ -H 'Slug: some-external-id' \ -XPOST https://deposit.softwareheritage.org/1/hal/ .. code:: http POST /1/hal/ HTTP/1.1 Host: deposit.softwareheritage.org Authorization: Basic xxxxxxxxxxxx= Slug: some-external-id In-Progress: false Content-Length: 123456 Content-Type: multipart/form-data; boundary=----------------------123456798 **Example response**: .. code:: http HTTP/1.1 201 Created Date: Tue, 26 Sep 2017 10:32:35 GMT Server: WSGIServer/0.2 CPython/3.5.3 Vary: Accept, Cookie Allow: GET, POST, PUT, DELETE, HEAD, OPTIONS Location: /1/hal/10/metadata/ X-Frame-Options: SAMEORIGIN Content-Type: application/xml 10 Sept. 26, 2017, 10:32 a.m. None deposited http://purl.org/net/sword/package/SimpleZip :reqheader Authorization: Basic authentication token :reqheader Content-Type: accepted mimetype :reqheader Content-Length: tarball size :reqheader Content-MD5: md5 checksum hex encoded of the tarball :reqheader Content-Disposition: attachment; filename=[filename]; the filename parameter must be text (ascii); for the metadata file set name parameter to 'atom'. :reqheader In-progress: `true` if not final; `false` when final request. :statuscode 201: success for deposit on POST :statuscode 401: Unauthorized :statuscode 404: access to an unknown collection :statuscode 415: unsupported media type diff --git a/docs/index.rst b/docs/index.rst index cf9e0071..a5d2bdbc 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -1,81 +1,95 @@ .. _swh-deposit: Software Heritage - Deposit =========================== -Push-based deposit of software source code artifacts and metadata to the +`SWORD v2`_ based deposit of software source code artifacts and metadata to the Software Heritage (SWH) Archive. Description ----------- Most of the software source code artifacts present in the SWH Archive are gathered by the mean of :term:`loader ` workers run by the SWH project from sourve code origins identified by :term:`lister ` workers. This is a pull mechanism: it's the responsibility of the SWH project to gather and collect source code artifacts that way. Alternatively, SWH allows its partners to push source code artifacts and metadata directly into the Archive with a push-based mechanism. By using this possibility different actors, holding software artifacts or metadata, can preserve their assets without having to pass through an intermediate collaborative development platform, which is already harvested by SWH (e.g GitHub, Gitlab, etc.). This mechanism is the `deposit`. The main idea is the deposit is an authenticated access to an API allowing the user to provide source code artifacts -- with metadata -- to be ingested in the SWH Archive. The result of that is a :ref:`SWHID ` that can be used to uniquely and persistently identify that very piece of source code. This unique identifier can then be used to `reference the source code `_ (e.g. in a `scientific paper `_) and retrieve it using the :ref:`vault ` feature of the SWH Archive platform. The differences between a piece of code uploaded using the deposit rather than simply asking SWH to archive a repository using the `save code now `_ feature are: - a deposited artifact is provided from one of the SWH partners which is regarded as a trusted authority, - a deposited artifact requires metadata properties describing the source code artifact, - a deposited artifact has a codemeta_ metadata entry attached to it, - a deposited artifact has the same visibility on the SWH Archive than a collected repository, - a deposited artifacts can be searched with its provided url property on the SWH Archive, - the deposit API uses the `SWORD v2`_ API, thus requires some tooling to send deposits to SWH. These tools are provided with this repository. See the :ref:`user-manual` page for more details on how to use the deposit -client tools to push a deposit in the SWH Archive. +client command line tools to push a deposit in the SWH Archive. + +See the :ref:`swh-api-specifications` reference pages of the SWORDv2 API +implementation in `swh.deposit` if you want to do upload deposits using HTTP +requests. + +Read the :ref:`metadata` chapter to get more details on what metadata are +supported when doing a deposit. + +See :ref:`swh-deposit-dev` if you want to hack the code of the `swh.deposit` +module. + +See :ref:`swh-deposit-deployment` if you want to deploy your own copy of the +`swh.deposit` stack. + .. _codemeta: https://codemeta.github.io/ .. _`SWORD v2`: http://swordapp.org/sword-v2/ .. toctree:: :maxdepth: 2 :caption: Contents: user-manual - spec-api metadata + spec-api dev-info sys-info specs/index tests/tests_HAL.rst Reference Documentation ----------------------- .. toctree:: :maxdepth: 2 /apidoc/swh.deposit diff --git a/docs/spec-api.rst b/docs/spec-api.rst index d45acb95..3789d342 100644 --- a/docs/spec-api.rst +++ b/docs/spec-api.rst @@ -1,113 +1,115 @@ -API Specification +.. _swh-api-specifications: + +API Documentation ================= This is `Software Heritage `__'s `SWORD 2.0 `__ Server implementation. **S.W.O.R.D** (**S**\ imple **W**\ eb-Service **O**\ ffering **R**\ epository **D**\ eposit) is an interoperability standard for digital file deposit. This implementation will permit interaction between a client (a repository) and a server (SWH repository) to push deposits of software source code archives with associated metadata. *Note:* * In the following document, we will use the ``archive`` or ``software source code archive`` interchangeably. * The supported archive formats are: * zip: common zip archive (no multi-disk zip files). * tar: tar archive without compression or optionally any of the following compression algorithm gzip (.tar.gz, .tgz), bzip2 (.tar.bz2) , or lzma (.tar.lzma) .. _swh-deposit-collection: Collection ---------- SWORD defines a ``collection`` concept. In SWH's case, this collection refers to a group of deposits. A ``deposit`` is some form of software source code archive(s) associated with metadata. By default the client's collection will have the client's name. Limitations ----------- * upload limitation of 100Mib * no mediation API overview ------------ API access is over HTTPS. The API is protected through basic authentication. Endpoints --------- The API endpoints are rooted at https://deposit.softwareheritage.org/1/. Data is sent and received as XML (as specified in the SWORD 2.0 specification). .. include:: endpoints/service-document.rst .. include:: endpoints/collection.rst .. include:: endpoints/update-media.rst .. include:: endpoints/update-metadata.rst .. include:: endpoints/status.rst .. include:: endpoints/content.rst Possible errors: ---------------- * common errors: * :http:statuscode:`401`:if a client does not provide credential or provide wrong ones * :http:statuscode:`403` a client tries access to a collection it does not own * :http:statuscode:`404` if a client tries access to an unknown collection * :http:statuscode:`404` if a client tries access to an unknown deposit * :http:statuscode:`415` if a wrong media type is provided to the endpoint * archive/binary deposit: * :http:statuscode:`403` the length of the archive exceeds the max size configured * :http:statuscode:`412` the length or hash provided mismatch the reality of the archive. * :http:statuscode:`415` if a wrong media type is provided * multipart deposit: * :http:statuscode:`412` the md5 hash provided mismatch the reality of the archive * :http:statuscode:`415` if a wrong media type is provided * Atom entry deposit: * :http:statuscode:`400` if the request's body is empty (for creation only) Sources ------- * `SWORD v2 specification `__ * `arxiv documentation `__ * `Dataverse example `__ * `SWORD used on HAL `__ * `xml examples for CCSD `__ diff --git a/docs/sys-info.rst b/docs/sys-info.rst index 12374a6e..821f63a4 100644 --- a/docs/sys-info.rst +++ b/docs/sys-info.rst @@ -1,95 +1,94 @@ +.. _swh-deposit-deployment: + Deployment of the swh-deposit ============================= As usual, the debian packaged is created and uploaded to the swh debian repository. Once the package is installed, we need to do a few things in regards to the database. Prepare the database setup (existence, connection, etc...). ----------------------------------------------------------- This is defined through the packaged ``swh.deposit.settings.production`` module and the expected **/etc/softwareheritage/deposit/server.yml**. As usual, the expected configuration files are deployed through our puppet manifest (cf. puppet-environment/swh-site, puppet-environment/swh-role, puppet-environment/swh-profile) Environment (production) ------------------------ `SWH_CONFIG_FILENAME` must be defined and target the deposit's server configuration file. So either prefix the following commands or export the environment variable in your shell session. .. code:: shell export SWH_CONFIG_FILENAME=/etc/softwareheritage/deposit/server.yml Migrate/bootstrap the db schema ------------------------------- .. code:: shell sudo django-admin migrate --settings=swh.deposit.settings.production Load minimum defaults data -------------------------- .. code:: shell sudo django-admin loaddata \ --settings=swh.deposit.settings.production deposit_data This adds the minimal 'hal' collection Note: swh.deposit.fixtures.deposit\_data is packaged Add client and collection ------------------------- .. code:: shell swh deposit admin \ --config-file /etc/softwareheritage/deposit/server.yml \ --platform production \ user create \ --collection \ --username \ --password This adds a user ```` which can access the collection ````. The password will be used for the authentication access to the deposit api. Note: - If the collection does not exist, it is created alongside - The password is plain text but stored encrypted (so yes, for now we know the user's password) - For production platform, you must either set an `SWH_CONFIG_FILENAME` environment variable or pass alongside the `--config-file` parameter Reschedule a deposit --------------------- .. code:: shell swh deposit admin \ --config-file /etc/softwareheritage/deposit/server.yml \ --platform production \ deposit reschedule \ --deposit-id This will: - check the deposit's status to something reasonable (failed or done). That means that the checks have passed alright but something went wrong during the loading (failed: loading failed, done: loading ok, still for some reasons as in bugs, we need to reschedule it) - reset the deposit's status to 'verified' (prior to any loading but after the checks which are fine) and removes the different archives' identifiers (swh-id, ...) - trigger back the loading task through the scheduler - - -