Page Menu
Home
Software Heritage
Search
Configure Global Search
Log In
Files
F9347078
No One
Temporary
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Flag For Later
Size
175 KB
Subscribers
None
View Options
diff --git a/PKG-INFO b/PKG-INFO
index 57b1e507..e37e6f52 100644
--- a/PKG-INFO
+++ b/PKG-INFO
@@ -1,94 +1,94 @@
Metadata-Version: 2.1
Name: swh.deposit
-Version: 0.13.5
+Version: 0.13.6
Summary: Software Heritage Deposit Server
Home-page: https://forge.softwareheritage.org/source/swh-deposit/
Author: Software Heritage developers
Author-email: swh-devel@inria.fr
License: UNKNOWN
Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
Project-URL: Funding, https://www.softwareheritage.org/donate
Project-URL: Source, https://forge.softwareheritage.org/source/swh-deposit
Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-deposit/
Description: Software Heritage - Deposit
===========================
Simple Web-Service Offering Repository Deposit (S.W.O.R.D) is an interoperability
standard for digital file deposit.
This repository is both the `SWORD v2`_ Server and a deposit command-line client
implementations.
This implementation allows interaction between a client (a repository) and a server (SWH
repository) to deposit software source code archives and associated metadata.
Description
-----------
Most of the software source code artifacts present in the SWH Archive are gathered by
the mean of :term:`loader <loader>` workers run by the SWH project from sourve code
origins identified by :term:`lister <lister>` workers. This is a pull mechanism: it's
the responsibility of the SWH project to gather and collect source code artifacts that
way.
Alternatively, SWH allows its partners to push source code artifacts and metadata
directly into the Archive with a push-based mechanism. By using this possibility
different actors, holding software artifacts or metadata, can preserve their assets
without having to pass through an intermediate collaborative development platform, which
is already harvested by SWH (e.g GitHub, Gitlab, etc.).
This mechanism is the `deposit`.
The main idea is the deposit is an authenticated access to an API allowing the user to
provide source code artifacts -- with metadata -- to be ingested in the SWH Archive. The
result of that is a :ref:`SWHID <persistent-identifiers>` that can be used to uniquely
and persistently identify that very piece of source code.
This unique identifier can then be used to `reference the source code
<https://hal.archives-ouvertes.fr/hal-02446202>`_ (e.g. in a `scientific paper
<https://www.softwareheritage.org/2020/05/26/citing-software-with-style/>`_) and
retrieve it using the :ref:`vault <swh-vault>` feature of the SWH Archive platform.
The differences between a piece of code uploaded using the deposit rather than simply
asking SWH to archive a repository using the `save code now
<https://archive.softwareheritage.org/save/>`_ feature are:
- a deposited artifact is provided from one of the SWH partners which is regarded as a
trusted authority,
- a deposited artifact requires metadata properties describing the source code artifact,
- a deposited artifact has a codemeta_ metadata entry attached to it,
- a deposited artifact has the same visibility on the SWH Archive than a collected
repository,
- a deposited artifact can be searched with its provided url property on the SWH
Archive,
- the deposit API uses the `SWORD v2`_ API, thus requires some tooling to send deposits
to SWH. These tools are provided with this repository.
See the :ref:`deposit-user-manual` page for more details on how to use the deposit client
command line tools to push a deposit in the SWH Archive.
See the :ref:`deposit-api-specifications` reference pages of the SWORDv2 API implementation
in `swh.deposit` if you want to do upload deposits using HTTP requests.
- Read the :ref:`metadata` chapter to get more details on what metadata are supported when
- doing a deposit.
+ Read the :ref:`deposit-metadata` chapter to get more details on what metadata
+ are supported when doing a deposit.
See :ref:`swh-deposit-dev-env` if you want to hack the code of the `swh.deposit` module.
See :ref:`swh-deposit-prod-env` if you want to deploy your own copy of the
`swh.deposit` stack.
.. _codemeta: https://codemeta.github.io/
.. _`SWORD v2`: http://swordapp.org/sword-v2/
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 5 - Production/Stable
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Provides-Extra: testing
Provides-Extra: server
diff --git a/README.rst b/README.rst
index 602e7f61..d934e7c0 100644
--- a/README.rst
+++ b/README.rst
@@ -1,71 +1,71 @@
Software Heritage - Deposit
===========================
Simple Web-Service Offering Repository Deposit (S.W.O.R.D) is an interoperability
standard for digital file deposit.
This repository is both the `SWORD v2`_ Server and a deposit command-line client
implementations.
This implementation allows interaction between a client (a repository) and a server (SWH
repository) to deposit software source code archives and associated metadata.
Description
-----------
Most of the software source code artifacts present in the SWH Archive are gathered by
the mean of :term:`loader <loader>` workers run by the SWH project from sourve code
origins identified by :term:`lister <lister>` workers. This is a pull mechanism: it's
the responsibility of the SWH project to gather and collect source code artifacts that
way.
Alternatively, SWH allows its partners to push source code artifacts and metadata
directly into the Archive with a push-based mechanism. By using this possibility
different actors, holding software artifacts or metadata, can preserve their assets
without having to pass through an intermediate collaborative development platform, which
is already harvested by SWH (e.g GitHub, Gitlab, etc.).
This mechanism is the `deposit`.
The main idea is the deposit is an authenticated access to an API allowing the user to
provide source code artifacts -- with metadata -- to be ingested in the SWH Archive. The
result of that is a :ref:`SWHID <persistent-identifiers>` that can be used to uniquely
and persistently identify that very piece of source code.
This unique identifier can then be used to `reference the source code
<https://hal.archives-ouvertes.fr/hal-02446202>`_ (e.g. in a `scientific paper
<https://www.softwareheritage.org/2020/05/26/citing-software-with-style/>`_) and
retrieve it using the :ref:`vault <swh-vault>` feature of the SWH Archive platform.
The differences between a piece of code uploaded using the deposit rather than simply
asking SWH to archive a repository using the `save code now
<https://archive.softwareheritage.org/save/>`_ feature are:
- a deposited artifact is provided from one of the SWH partners which is regarded as a
trusted authority,
- a deposited artifact requires metadata properties describing the source code artifact,
- a deposited artifact has a codemeta_ metadata entry attached to it,
- a deposited artifact has the same visibility on the SWH Archive than a collected
repository,
- a deposited artifact can be searched with its provided url property on the SWH
Archive,
- the deposit API uses the `SWORD v2`_ API, thus requires some tooling to send deposits
to SWH. These tools are provided with this repository.
See the :ref:`deposit-user-manual` page for more details on how to use the deposit client
command line tools to push a deposit in the SWH Archive.
See the :ref:`deposit-api-specifications` reference pages of the SWORDv2 API implementation
in `swh.deposit` if you want to do upload deposits using HTTP requests.
-Read the :ref:`metadata` chapter to get more details on what metadata are supported when
-doing a deposit.
+Read the :ref:`deposit-metadata` chapter to get more details on what metadata
+are supported when doing a deposit.
See :ref:`swh-deposit-dev-env` if you want to hack the code of the `swh.deposit` module.
See :ref:`swh-deposit-prod-env` if you want to deploy your own copy of the
`swh.deposit` stack.
.. _codemeta: https://codemeta.github.io/
.. _`SWORD v2`: http://swordapp.org/sword-v2/
diff --git a/debian/changelog b/debian/changelog
index 1be70481..d52fb0bd 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,1218 +1,1220 @@
-swh-deposit (0.13.5-1~swh1~bpo10+1) buster-swh; urgency=medium
+swh-deposit (0.13.6-1~swh1) unstable-swh; urgency=medium
- * Rebuild for buster-swh
+ * New upstream release 0.13.6 - (tagged by Antoine Lambert
+ <antoine.lambert@inria.fr> on 2021-04-29 14:23:04 +0200)
+ * Upstream changes: - version 0.13.6
- -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 08 Apr 2021 13:13:08 +0000
+ -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 29 Apr 2021 12:29:05 +0000
swh-deposit (0.13.5-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.13.5 - (tagged by Valentin Lorentz
<vlorentz@softwareheritage.org> on 2021-04-08 14:59:58 +0200)
* Upstream changes: - v0.13.5 - * Reorganize documentation
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 08 Apr 2021 13:11:11 +0000
swh-deposit (0.13.4-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.13.4 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2021-03-26 18:02:42
+0100)
* Upstream changes: - v0.13.4 - docs: Add a reference
authentication page to explain auth schemes - docs/sys-info:
Update deployment documentation - docs/sys-info: Update
information and rework sentence phrasing - docs: Unify READMEs
in the documentation and the source code - cli.client: Fix
sphinx indentation warning
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Fri, 26 Mar 2021 17:07:13 +0000
swh-deposit (0.13.3-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.13.3 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2021-03-24 16:49:18
+0100)
* Upstream changes: - v0.13.1 - deposit.auth: Namespace cache
keys per realm, client_id and user_id - deposit.auth: Logs
warning during cache token retrieval failure
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 24 Mar 2021 15:54:29 +0000
swh-deposit (0.13.2-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.13.2 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2021-03-23 17:52:22
+0100)
* Upstream changes: - v0.13.2 - deposit.auth: Fix
authentication failure corner case
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Tue, 23 Mar 2021 16:57:49 +0000
swh-deposit (0.13.1-1~swh2) unstable-swh; urgency=medium
* Fix runtime dependency
-- Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Tue, 23 Mar 2021 14:18:01 +0100
swh-deposit (0.13.1-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.13.1 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2021-03-23 11:34:02
+0100)
* Upstream changes: - v0.13.1 - deposit.auth: Adjust
authentication error message - deposit.cli: Fix service document
error when failing to retrieve it - deposit.cli: Fix cli parsing
issue when xml error is returned by server
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Tue, 23 Mar 2021 10:40:03 +0000
swh-deposit (0.13.0-1~swh2) unstable-swh; urgency=medium
* Fix build time dependency release
-- Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Mon, 22 Mar 2021 12:40:40 +0100
swh-deposit (0.13.0-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.13.0 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2021-03-22 12:03:21
+0100)
* Upstream changes: - v0.13.0 - Allow to configure
authentication mechanism per config file - Delegate
authentication to keycloak
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 22 Mar 2021 11:13:37 +0000
swh-deposit (0.12.0-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.12.0 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2021-03-16 09:28:08
+0100)
* Upstream changes: - v0.12.0 - Add deposit info to objects
added to swh-storage from metadata-only deposits - tests:
Simplify discovery_date comparison. - Check a SWHID exists in
the archive before accepting a metadata-only deposit
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Tue, 16 Mar 2021 08:32:08 +0000
swh-deposit (0.11.1-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.11.1 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2021-03-10 12:53:10
+0100)
* Upstream changes: - v0.11.1 - tests: Start testing migration
scripts
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 10 Mar 2021 11:56:53 +0000
swh-deposit (0.11.0-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.11.0 - (tagged by Valentin Lorentz
<vlorentz@softwareheritage.org> on 2021-03-02 12:33:11 +0100)
* Upstream changes: - v0.11.0 - * Use CoreSWHID/QualifiedSWHID
instead of the deprecated SWHID class.
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Tue, 02 Mar 2021 11:38:13 +0000
swh-deposit (0.10.2-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.10.2 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2021-02-26 12:08:20
+0100)
* Upstream changes: - v0.10.2 - deposit.urls: Retro-
compatibility fix about import and type conflict
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Fri, 26 Feb 2021 11:14:00 +0000
swh-deposit (0.10.1-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.10.1 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2021-02-25 16:51:28
+0100)
* Upstream changes: - v0.10.1 - tests: Tentatively try to fix
the debian build
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 25 Feb 2021 15:56:31 +0000
swh-deposit (0.10.0-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.10.0 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2021-02-25 15:52:55
+0100)
* Upstream changes: - v0.10.0 - Aggregate deposit archives
into a temporary tarball instead of a zip - Fix swh.deposit.urls
typing - deposit.cli: Warn users when missing origin tags are
detected - Stop recommending the Slug header as the alternative
to <external_identifier>. - deposit.client.cli: Expose --create-
origin flag and deprecate --slug - test: Fix failing test
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 25 Feb 2021 14:58:13 +0000
swh-deposit (0.9.2-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.9.2 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2021-01-14 12:42:12
+0100)
* Upstream changes: - v0.9.2 - Make <swh:add_to_origin> set
Deposit.origin_url + disable aggressive deprecation warning -
test_collection_reuse_slug: Assert Deposit.origin_url is set. -
cli: Drop dead code - user-manual: Add deposit metadata update
scenario - docs: Explicit the new deposit creation
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 14 Jan 2021 11:46:11 +0000
swh-deposit (0.9.1-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.9.1 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2021-01-06 09:34:41
+0100)
* Upstream changes: - v0.9.1 - client: Fix url to update
metadata on deposit with status 'done' - doc: Add cli section
- homepage: Fix broken link
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 06 Jan 2021 08:39:41 +0000
swh-deposit (0.9.0-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.9.0 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2021-01-04 17:31:18
+0100)
* Upstream changes: - v0.9.0 - Handle <swh:create_origin> /
<swh:add_to_origin> in multipart uploads - Split
<swh:create_origin> / <swh:add_to_origin> handling to its own
function - Catch invalid dates before marking a deposit as
verified - tests: Reorganize and refactor boilerplate -
docs: Rephrase introduction of the metadata-only deposit
documentation - docs: Document metadata updates
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Tue, 05 Jan 2021 08:06:42 +0000
swh-deposit (0.8.0-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.8.0 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2020-12-18 10:20:12
+0100)
* Upstream changes: - v0.8.0 - Allow metadata only deposit
(server, client) - Documentation updated accordingly
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Fri, 18 Dec 2020 09:30:06 +0000
swh-deposit (0.7.3-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.7.3 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2020-12-15 15:09:39
+0100)
* Upstream changes: - v0.7.3 - server: Fix metadata-only
deposit which are currently rejected - Allow metadata-only
deposit client side - Trap and report exceptions in a unified
way within the cli - Move parse_swh_reference to
swh.deposit.utils namespace
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Tue, 15 Dec 2020 14:13:15 +0000
swh-deposit (0.7.2-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.7.2 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2020-12-10 19:09:54
+0100)
* Upstream changes: - v0.7.2 - swh/deposit/migrations/0021:
Fix migration script
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 10 Dec 2020 18:13:50 +0000
swh-deposit (0.7.1-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.7.1 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2020-12-10 13:55:22
+0100)
* Upstream changes: - v0.7.1 - Use string equality instead of
substring search to check for mandatory fields. - Accept
<codemeta:name> and <codemeta:author> as alternatives to
<atom:name>/<atom:title> and <atom:author>.
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 10 Dec 2020 13:01:47 +0000
swh-deposit (0.7.0-1~swh2) unstable-swh; urgency=medium
* Bump new dependency
-- Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Tue, 08 Dec 2020 05:12:30 +0000
swh-deposit (0.7.0-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.7.0 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2020-12-08 13:25:57
+0100)
* Upstream changes: - v0.7.0 - docs: Add a complete deposit
protocol reference - Implement tag <swh:add_to_origin> to
replace the Slug header for parent - relationships -
Implement tag <swh:create_origin> to replace the Slug header -
Return the origin url on 'GET State-IRI' - Make the Slug header
optional - docs: Improve documentations and drop non implemented
specifications - Remove the <client> tag from the protocol. -
Remove the <external_identifier> tag from the protocol. - Move
SWH-specific tags to the
https://www.softwareheritage.org/schema/2018/deposit namespace -
logging: Log error messages when that occurs - typing: Improve
deposit types - Refactor exception handling - Split SE-IRI
and Edit-IRI. - remove assumption that Edit-IRI and SE-IRI are
the same from test - Rename files and classes in
swh/deposit/api/deposit_* to be consistent with SWORD terminology.
- swh.xsd: Use the
https://www.softwareheritage.org/schema/2018/deposit namespace -
Fix SWORD XMLNS (http://purl.org/net/sword/ ->
http://purl.org/net/sword/terms/) - Refactoring to clarify
source - tests: Refactoring
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Tue, 08 Dec 2020 12:35:45 +0000
swh-deposit (0.6.0-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.6.0 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2020-11-18 15:58:34
+0100)
* Upstream changes: - v0.6.0 - Adapt existing POST to a
collection to allow metadata-only deposit -
deposit.settings.production: Remove the logging django configuration
- Customize the user-agent header in deposit client classes -
Simplify `swh deposit upload` cli options - doc: improve the
user manual documentation - doc: rename Getting Started as User
Manual and update the content - doc: add an introduction
paragraph in blueprint.rst - doc: improve the spec-loading doc
- doc: improve the doc of API endpoints - doc: rename
docs/specs/specs.rst as docs/specs/index.rst - doc: Add a
description of the deposit in the docs' index page - doc: spec-
meta-deposit: Update metadata-only deposit samples and requirements
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 18 Nov 2020 15:01:09 +0000
swh-deposit (0.5.0-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.5.0 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2020-10-19 09:53:36
+0200)
* Upstream changes: - v0.5.0 - tests: Migrate configuration to
the latest schema change - spec-technical: Add 'partial' self-
loop + annotate transitions in the status diagram - spec-
technical: Rewrite state diagram using Plantuml.
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 19 Oct 2020 07:58:12 +0000
swh-deposit (0.4.0-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.4.0 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2020-10-16 08:57:58
+0200)
* Upstream changes: - v0.4.0 - cli.client: Adapt metadata
generation so server side checks pass - Update blueprints with
correct status names and definitions - Rewrite blueprint
flowcharts using Plantuml instead of .png files. -
deposit.cli.admin: Add coverage on `swh deposit reschedule` cli
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Fri, 16 Oct 2020 07:02:23 +0000
swh-deposit (0.3.0-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.3.0 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2020-10-13 15:12:50
+0200)
* Upstream changes: - v0.3.0 - deposit_client: Allow deposit
metadata update on completed deposit - deposit.client: Improve
cli error messages and add missing coverage - cli.client: Add
coverage - cli.client: Add types - cli.client: Add types and
refactor tests - conftest: Declare swh.core pytest_plugin -
deposit: Reuse config.load_from_envvar for configuration loading
- test_deposit_content: Add missing coverage
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Tue, 13 Oct 2020 13:16:30 +0000
swh-deposit (0.2.0-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.2.0 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2020-10-01 15:25:42
+0200)
* Upstream changes: - v0.2.0 - Allow deposit metadata update
on deposit already completed - Transit raw metadata to the
loader to unify with metadata update scenario - deposit*: Rename
internally swh_id references to swhid - deposit.parsers: Process
namespace when using xmltodict.parse - tests: Add missing update
scenarios - tests: Explicit the bad request scenario error
messages - tests: Ensure all empty body test cases are covered
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 01 Oct 2020 13:29:29 +0000
swh-deposit (0.1.0-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.1.0 - (tagged by David Douard
<david.douard@sdfa3.org> on 2020-09-25 11:49:24 +0200)
* Upstream changes: - v0.1.0
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Fri, 25 Sep 2020 09:53:06 +0000
swh-deposit (0.0.90-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.90 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2020-06-01 12:25:53
+0200)
* Upstream changes: - v0.0.90 - swh.deposit.models: Upload
deposit archives to dedicated folder
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 01 Jun 2020 10:31:57 +0000
swh-deposit (0.0.89-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.89 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2020-05-20 17:45:23
+0200)
* Upstream changes: - v0.0.89 - private/deposit_list: Allow
exclusion patterns from listing
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 20 May 2020 15:49:10 +0000
swh-deposit (0.0.88-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.88 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2020-05-20 11:21:31
+0200)
* Upstream changes: - v0.0.88 - Drop swh_anchor_id* columns
from Deposit model
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 20 May 2020 09:28:14 +0000
swh-deposit (0.0.87-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.87 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2020-05-15 12:48:31
+0200)
* Upstream changes: - v0.0.87 - Migrate deposit SWHIDs (data)
to the new specification - Update deposit swhid to respect the
latest specification update - Fix typos
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Fri, 15 May 2020 10:52:33 +0000
swh-deposit (0.0.86-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.86 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2020-05-11 18:04:03
+0200)
* Upstream changes: - v0.0.86 - origin/master Make deposit
client deal properly with maintenance issues - maintenance
exception: Make the exception appear as raw content
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 11 May 2020 16:07:28 +0000
swh-deposit (0.0.85-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.85 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2020-05-11 15:29:11
+0200)
* Upstream changes: - v0.0.85 - origin/master Align 503
exceptions output format with existing errors - Add deposit
exception handler to improve default error display - tox: Drop
django1 entry, we use django2 in production - settings.common:
Format
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 11 May 2020 13:40:23 +0000
swh-deposit (0.0.84-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.84 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2020-05-07 14:48:54
+0100)
* Upstream changes: - v0.0.84 - test: Add checks scenario test
cases - test_task: Mark some test to be ignored during debian
build
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 07 May 2020 13:52:51 +0000
swh-deposit (0.0.83-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.83 - (tagged by Valentin Lorentz
<vlorentz@softwareheritage.org> on 2020-05-07 11:52:28 +0200)
* Upstream changes: - v0.0.83 - * Pass collection + id to the
checker instead of an URL.
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 07 May 2020 09:56:40 +0000
swh-deposit (0.0.82-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.82 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2020-04-23 16:40:34
+0200)
* Upstream changes: - v0.0.82 - deposit_read: Simplify api to
return only relevant deposit information - setup: Update the
minimum required runtime python3 version - spec: reference SWHID
using explicit anchors - Update type annotations and signatures
to match djangorestframework-stubs - pytest.ini: Avoid loading
flask plugin to prevent fixture name clash - Add a
pyproject.toml file to target py37 for black - Enable black -
tests: Adapt init_sentry api change call with environment parameter
- docs: Fix sphinx warnings - tests/gunicorn_config: Fix tests
after recent changes in swh-core
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 23 Apr 2020 14:45:49 +0000
swh-deposit (0.0.81-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.81 - (tagged by David Douard
<david.douard@sdfa3.org> on 2020-01-13 11:50:07 +0100)
* Upstream changes: - v0.0.81
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 13 Jan 2020 11:07:54 +0000
swh-deposit (0.0.80-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.80 - (tagged by David Douard
<david.douard@sdfa3.org> on 2020-01-09 15:54:57 +0100)
* Upstream changes: - v0.0.80
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 09 Jan 2020 15:00:32 +0000
swh-deposit (0.0.79-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.79 - (tagged by Valentin Lorentz
<vlorentz@softwareheritage.org> on 2019-12-19 17:37:41 +0100)
* Upstream changes: - v0.0.79 - * requirements: Pin mypy and
django-stubs version - * homepage: Improve sentence phrasing
- * deposit.api: Add a basic api page to avoid broken link - *
removed dead and deprecated code - * Add sentry integration.
- * Fix log level + status code of the client CLI in case of error.
- * Improve validation of --author and --name. - * Update
documentation of --author to use names instead of emails.
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 19 Dec 2019 16:42:28 +0000
swh-deposit (0.0.78-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.78 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-11-25 18:28:13
+0100)
* Upstream changes: - v0.0.78 - deposit.signal: Simplify
configuration entry
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 25 Nov 2019 17:34:33 +0000
swh-deposit (0.0.77-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.77 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-11-25 14:48:59
+0100)
* Upstream changes: - v0.0.77 - deposit.signals: Send
versioned scheduler tasks - deposit.signals: Scheduler load-
deposit task with new endpoints - mypy: Fix missing import
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 25 Nov 2019 14:02:38 +0000
swh-deposit (0.0.76-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.76 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-11-25 12:26:07
+0100)
* Upstream changes: - v0.0.76 - Start adding mypy annotation
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 25 Nov 2019 11:29:48 +0000
swh-deposit (0.0.75-1~swh2) unstable-swh; urgency=medium
* Add egg-info to pybuild.testfiles
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Wed, 20 Nov 2019 15:06:31 +0100
swh-deposit (0.0.75-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.75 - (tagged by Nicolas Dandrimont
<nicolas@dandrimont.eu> on 2019-10-30 16:50:25 +0100)
* Upstream changes: - Release swh.deposit v0.0.75 - Revert
changes to the task signature until the new loader is ready -
Migrate tests to pytest
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 30 Oct 2019 15:54:55 +0000
swh-deposit (0.0.74-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.74 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-10-09 11:30:00
+0200)
* Upstream changes: - v0.0.74 - deposit.signals: Scheduler
load-deposit task with new endpoints - deposit.private.api:
Expose new endpoints with no collection name - models: Migrate
model to enforce check on delete - setup: register the worker
task in the swh.workers entrypoint - tests: Explicit private
tests in their names - admin CLI: avoid redefining deposit name
in admin subcommand
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 09 Oct 2019 09:35:03 +0000
swh-deposit (0.0.73-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.73 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-09-05 10:15:29
+0200)
* Upstream changes: - v0.0.73 - loader: Add missing visit_type
attribute. - cli/client: Simplify url definition to use -
cli/admin: Add the default domain value to empty - deposit_data:
Remove no longer used DepositRequestType references - doc/sys-
info: Clarify commands - docs: add code of conduct document
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 05 Sep 2019 08:19:17 +0000
swh-deposit (0.0.72-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.72 - (tagged by Valentin Lorentz
<vlorentz@softwareheritage.org> on 2019-06-18 17:15:53 +0200)
* Upstream changes: - Remove argument origin_id from call to
Loader.send_origin_metadata. - It no longer needs that argument.
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 19 Jun 2019 14:26:44 +0000
swh-deposit (0.0.71-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.71 - (tagged by Antoine Lambert
<antoine.lambert@inria.fr> on 2019-05-23 11:02:05 +0200)
* Upstream changes: - version 0.0.71
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 23 May 2019 09:09:54 +0000
swh-deposit (0.0.70-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.70 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-05-10 12:04:33
+0200)
* Upstream changes: - v0.0.70 - Update documentations (getting-
started, metadata) - Improve cli client (expose new status
subcommand, clarify help messages) - Fixes some issues
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Fri, 10 May 2019 10:33:19 +0000
swh-deposit (0.0.69-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.69 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-05-09 10:25:43
+0200)
* Upstream changes: - v0.0.69 - cli.admin: Add an admin
subcommand 'deposit reschedule' - models: Keep scheduler task
ids reference on deposit model - cli: make the deposit cli
command a subcommand of the main 'swh' one - docs: update the
getting-started document to use 'swh deposit upload' command -
docs: Update the sys-info document to expose the `swh deposit admin
deposit - reschedule` subcommand
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 09 May 2019 08:35:53 +0000
swh-deposit (0.0.68-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.68 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-04-17 15:39:52
+0200)
* Upstream changes: - v0.0.68
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 17 Apr 2019 13:50:21 +0000
swh-deposit (0.0.67-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.67 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-02-20 17:39:59
+0100)
* Upstream changes: - v0.0.67 - settings.prod: Use
SWH_CONFIG_FILENAME to load & check swh config - requirements-
swh.txt: Update minimal version required for loader-tar - remove
debian/ tree from master branch
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 20 Feb 2019 16:45:22 +0000
swh-deposit (0.0.66-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.66 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-02-16 10:22:30
+0100)
* Upstream changes: - v0.0.66 - deposit.loader.checker: Fix
logger initialization
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Sat, 16 Feb 2019 09:27:54 +0000
swh-deposit (0.0.65-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.65 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-02-14 18:28:48
+0100)
* Upstream changes: - v0.0.65 - swh/manage: Fix flake8 warning
- Bump dependency on swh-scheduler 0.0.39 - Rewrite celery tasks
as a decorated function - loader.scheduler: Remove non
production code - deposit.loader.tasks: Add tests
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 14 Feb 2019 17:34:09 +0000
swh-deposit (0.0.63-1~swh2) unstable-swh; urgency=medium
* New upstream release, fixing dependencies
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Thu, 14 Feb 2019 18:16:04 +0100
swh-deposit (0.0.63-1~swh1) unstable-swh; urgency=medium
* v0.0.63
* deposit_list: Return status_detail as string message and not as
nested
* dict
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Mon, 17 Sep 2018 16:23:58 +0200
swh-deposit (0.0.62-1~swh1) unstable-swh; urgency=medium
* v0.0.62
* private.deposit_list: Make the endpoint private
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Wed, 12 Sep 2018 15:58:21 +0200
swh-deposit (0.0.61-1~swh1) unstable-swh; urgency=medium
* v0.0.61
* Add api endpoint to list deposits with pagination
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Wed, 12 Sep 2018 14:44:58 +0200
swh-deposit (0.0.60-1~swh1) unstable-swh; urgency=medium
* v0.0.60
* Fix production issue regarding deposit status endpoint.
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Tue, 04 Sep 2018 18:30:01 +0200
swh-deposit (0.0.59-1~swh1) unstable-swh; urgency=medium
* v0.0.59
* deposit.utils: Fix the potential metadata information loss
* docs: Add sparse/metadata deposit specs
* docs: Update documentation about persistent id with context
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Tue, 24 Jul 2018 14:15:55 +0200
swh-deposit (0.0.58-1~swh1) unstable-swh; urgency=medium
* v0.0.58
* d/*: Update to latest python3-swh.model dependency version
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Mon, 23 Jul 2018 14:31:46 +0200
swh-deposit (0.0.57-1~swh1) unstable-swh; urgency=medium
* v0.0.57
* swh.deposit.client: Simplify client parsing
* api/deposit_status: Make swh-id be a directory id derivative
* swh.deposit.models: Keep deposit request's raw metadata
* bin: Migrate internal script to use the deposit client
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Mon, 23 Jul 2018 13:43:23 +0200
swh-deposit (0.0.56-1~swh1) unstable-swh; urgency=medium
* v0.0.56
* docs: Update deposit with status rejected documentation
* deposit_status: Update the deposit status endpoint with details for
* rejected deposit
* deposit_check: Reject invalid deposit (associated archive containing
* only single archive)
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Tue, 17 Jul 2018 13:27:35 +0200
swh-deposit (0.0.55-1~swh1) unstable-swh; urgency=medium
* v0.0.55
* deposit_status: Open detailed status when a deposit fails the checks
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Wed, 11 Jul 2018 12:06:10 +0200
swh-deposit (0.0.54-1~swh1) unstable-swh; urgency=medium
* v0.0.54
* deposit_check: Improve details in failing checks
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Tue, 10 Jul 2018 10:23:35 +0200
swh-deposit (0.0.53-1~swh1) unstable-swh; urgency=medium
* v0.0.53
* swh.deposit.parsers: Fix xml parsing to not lose duplicated entries
* swh.deposit.tests: Make sure failing tests are complete
* deposit_update: Fix check error during update with wrong mimetype
* deposit_read: Persistent identifier representation has changed
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Fri, 06 Jul 2018 16:26:08 +0200
swh-deposit (0.0.52-1~swh1) unstable-swh; urgency=medium
* v0.0.52
* Make the deposit's scheduler configuration adjustable
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Thu, 03 May 2018 15:16:30 +0200
swh-deposit (0.0.51-1~swh1) unstable-swh; urgency=medium
* v0.0.51
* Improve origin_visit initialization step
* Properly sandbox the prepare statement so that if it breaks, we can
* update appropriately the visit with the correct status
* Let the visit date be set in lower layer
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Wed, 07 Mar 2018 11:07:11 +0100
swh-deposit (0.0.50-1~swh1) unstable-swh; urgency=medium
* v0.0.50
* Bump requirements up for new swh.loader.tar
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Mon, 12 Feb 2018 11:21:03 +0100
swh-deposit (0.0.49-1~swh1) unstable-swh; urgency=medium
* Release swh.deposit v0.0.49
* Use snapshots instead of occurrences
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Tue, 06 Feb 2018 14:48:53 +0100
swh-deposit (0.0.48-1~swh1) unstable-swh; urgency=medium
* v0.0.48
* swh.deposit.api.private: Fix revision message missing client name
* docs: simplify endpoints and delete all sword text
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Fri, 02 Feb 2018 08:37:39 +0100
swh-deposit (0.0.47-1~swh1) unstable-swh; urgency=medium
* v0.0.47
* swh.loader: Be consistent in returning loader result in task
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Tue, 30 Jan 2018 19:03:49 +0100
swh-deposit (0.0.46-1~swh1) unstable-swh; urgency=medium
* v0.0.46
* swh.deposit.client: Explicit private api client
* swh.deposit.client.cli: Fix flag compatibility issue
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Tue, 30 Jan 2018 12:19:09 +0100
swh-deposit (0.0.45-1~swh1) unstable-swh; urgency=medium
* v0.0.45
* Simplify collection name retrieval
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Mon, 29 Jan 2018 18:09:29 +0100
swh-deposit (0.0.44-1~swh1) unstable-swh; urgency=medium
* v0.0.44
* docs: Update getting-started documentations
* swh.deposit.api: Add swh_id key to None by default in status
endpoint
* swh.deposit.api: Fix tar archive permission in update endpoints
* swh.deposit.api.private: Fix pep8 violation about catch Exception
* swh.deposit.api: Do not hardcode the server uri in service document
* endpoint
* swh.deposit.client: Add a deposit client
* d/control: Create python3-swh.deposit.client package
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Mon, 29 Jan 2018 17:41:12 +0100
swh-deposit (0.0.43-1~swh1) unstable-swh; urgency=medium
* v0.0.43
* swh.deposit.api: Deposit returns persistent identifiers
* swh.deposit.api: Rename deposit statuses
* swh.deposit: Support standard tarball formats (.tar.*)
* docs: Update documentation about new archive format support
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Wed, 17 Jan 2018 12:02:20 +0100
swh-deposit (0.0.42-1~swh1) unstable-swh; urgency=medium
* v0.0.42
* Fix cosmetic issues in deposit.s.o
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Wed, 10 Jan 2018 16:25:48 +0100
swh-deposit (0.0.41-1~swh1) unstable-swh; urgency=medium
* v0.0.41
* swh.deposit.checks: Add url validation
* swh.deposit: Add splash screen to homepage
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Wed, 10 Jan 2018 12:12:43 +0100
swh-deposit (0.0.40-1~swh1) unstable-swh; urgency=medium
* v0.0.40
* Fix corner case on deposit checks and clarify intents on checks
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Mon, 08 Jan 2018 18:21:38 +0100
swh-deposit (0.0.39-1~swh1) unstable-swh; urgency=medium
* v0.0.39
* Amend detail status message
* docs: Fix url
* Fix check of deposit without content scenario
* Refactor metadata check with pythonic function
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Mon, 08 Jan 2018 12:43:15 +0100
swh-deposit (0.0.38-1~swh1) unstable-swh; urgency=medium
* v0.0.38
* Provide existing swh id in the status api when it exists
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Fri, 08 Dec 2017 09:41:21 +0100
swh-deposit (0.0.37-1~swh1) unstable-swh; urgency=medium
* v0.0.37
* Adapt to latest dependency on loader-core and storage
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Thu, 07 Dec 2017 15:30:32 +0100
swh-deposit (0.0.36-1~swh1) unstable-swh; urgency=medium
* v0.0.36
* swh.deposit.api.private: Deposit's author is swh
* d/control: Bump to latest dependencies version
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Wed, 06 Dec 2017 12:21:45 +0100
swh-deposit (0.0.35-1~swh1) unstable-swh; urgency=medium
* v0.0.35
* swh.deposit.loader: Fix intermediary status
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Tue, 05 Dec 2017 19:23:40 +0100
swh-deposit (0.0.34-1~swh1) unstable-swh; urgency=medium
* v0.0.34
* d/control: Bump to latest version
* Fix client config typo for local url
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Tue, 05 Dec 2017 15:43:59 +0100
swh-deposit (0.0.33-1~swh1) unstable-swh; urgency=medium
* v0.0.33
* Bump to latest swh.loader.tar
* dev swh.deposit.api: Add parent deposit to deposit at creation time
* swh.deposit.service: Clean dead code
* bin/Makefile: Add multipart deposit sample script
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Mon, 04 Dec 2017 18:59:50 +0100
swh-deposit (0.0.32-1~swh1) unstable-swh; urgency=medium
* v0.0.32
* Migrate swh.deposit.injection module to swh.deposit.loader
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Thu, 30 Nov 2017 16:27:19 +0100
swh-deposit (0.0.31-1~swh1) unstable-swh; urgency=medium
* v0.0.31
* swh.deposit.api: Separate public/private api
* swh.deposit.api.common: Fix authentication issue in production
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Thu, 30 Nov 2017 13:18:15 +0100
swh-deposit (0.0.30-1~swh1) unstable-swh; urgency=medium
* v0.0.30
* swh.deposit.signals: Fix wrong task scheduling argument
* swh.deposit.migration: Remove default value
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Wed, 29 Nov 2017 18:34:29 +0100
swh-deposit (0.0.29-1~swh1) unstable-swh; urgency=medium
* v0.0.29
* Fix inconsistent term in code
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Wed, 29 Nov 2017 15:22:36 +0100
swh-deposit (0.0.28-1~swh1) unstable-swh; urgency=medium
* v0.0.28
* swh.deposit: Use revision_id as swh_id
* swh.deposit: Untangle checks from the current client/server requests
flow
* swh.deposit.injection: Trigger scheduling of checks on deposit
* swh.deposit.injection: Trigger scheduling of loading on deposit
* swh.deposit.injection: Add origin_metadata during loading
* d/control: Bump to latest swh layers
* swh.deposit.tests: Add and refactor tests
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Wed, 29 Nov 2017 15:03:56 +0100
swh-deposit (0.0.27-1~swh1) unstable-swh; urgency=medium
* v0.0.27
* swh.deposit.parsers: Fix edge case about decimal serialization
* swh.deposit.injection.loader: Add test on loading a deposit
* swh.loader.test: Fix path initialization
* swh.deposit.injection.scheduler: Adapt default task
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Fri, 17 Nov 2017 10:54:08 +0100
swh-deposit (0.0.26-1~swh1) unstable-swh; urgency=medium
* v0.0.26
* swh.deposit: Be consistent in the deposit_status key returned
* docs: Move actual docs inside the docs/ folder
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Wed, 25 Oct 2017 17:37:02 +0200
swh-deposit (0.0.25-1~swh1) unstable-swh; urgency=medium
* v0.0.25
* swh.deposit.production: Add support for proxy headers in django
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Tue, 24 Oct 2017 14:08:42 +0200
swh-deposit (0.0.24-1~swh1) unstable-swh; urgency=medium
* v0.0.24
* swh.deposit.api: Fix 500 error when browsing api through browser
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Mon, 23 Oct 2017 15:37:11 +0200
swh-deposit (0.0.23-1~swh1) unstable-swh; urgency=medium
* v0.0.23
* swh.deposit.api: Deal with new 'rejected' status on deposit
* docs: Update documentation to latest development
* swh.deposit.api: Update docstrings properly
* swh.deposit.api: Add post check validation on deposit
* swh.deposit.tests: Add edge case scenario tests (upload size limit,
* deposit read archives, etc...)
* swh.deposit.api: Fix mismatch hash check message
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Mon, 23 Oct 2017 11:31:39 +0200
swh-deposit (0.0.22-1~swh1) unstable-swh; urgency=medium
* v0.0.22
* swh.deposit.api: Return fqdn urls
* swh.deposit.api: Use variable to define the pivot status 'ready'
* swh.deposit.api: Add state iri in the deposit receipt
* swh.deposit.api: Add deposit's status in the deposit receipt
* swh.deposit.api: Explicit no support for Metadata-Relevant header
* swh.deposit.tests: Fix potential listing error
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Thu, 19 Oct 2017 15:38:01 +0200
swh-deposit (0.0.21-1~swh1) unstable-swh; urgency=medium
* v0.0.21
* swh.deposit.api: Simplify clean up temporary directory routine for
* deposit read archive api
* swh.deposit.tests: Add missing test cases around deposit read api
* README-injection: Improve sentence phrasing
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Wed, 18 Oct 2017 11:36:40 +0200
swh-deposit (0.0.20-1~swh1) unstable-swh; urgency=medium
* v0.0.20
* swh.deposit.scheduler: Move scheduling part to swh.deposit.injection
* swh.deposit.injection: Separation of concern between reading/loading
* swh.deposit.api: Remove dead code
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Mon, 16 Oct 2017 18:23:19 +0200
swh-deposit (0.0.19-1~swh1) unstable-swh; urgency=medium
* v0.0.19
* Define summary message for method not allowed endpoints
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Mon, 16 Oct 2017 12:46:11 +0200
swh-deposit (0.0.18-1~swh1) unstable-swh; urgency=medium
* v0.0.18
* swh.deposit.api.deposit: make slug header mandatory
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Sat, 14 Oct 2017 10:45:08 +0200
swh-deposit (0.0.17-1~swh1) unstable-swh; urgency=medium
* v0.0.17
* Fix missing files for packaging
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Fri, 13 Oct 2017 16:37:27 +0200
swh-deposit (0.0.16-1~swh1) unstable-swh; urgency=medium
* v0.0.16
* packaging: Split python3-swh.deposit / python3-swh.deposit.injection
* swh.deposit.injection: Add deposit archive ingestion task
* packaging: Cleanup
* swh.deposit.auth: Cleanup authentication to use directly drf's
* swh.deposit.api: Split between private and public api
* swh.deposit.api: Add private api to update deposit's status
* swh.deposit.scheduler.cli: Add one-shot task scheduling machinery
* swh.deposit.tests: use the collection name when creating uri
* swh.deposit.admin: Clean up unused code
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Fri, 13 Oct 2017 15:41:47 +0200
swh-deposit (0.0.15-1~swh1) unstable-swh; urgency=medium
* v0.0.15
* swh.deposit.api: Add service to clean up temporary archives
* swh.deposit.api: Add private api to read a deposit's raw content
* swh.deposit.api: Update docstring
* swh.deposit.api: Switch from objstorage layer to django's
* docs: Fix typo in private yaml sample
* README-dev: Update documentation about bootstraping the dev env
* swh.deposit.tests: Check existence before directory cleanup
* Remove reference to noop and verbose since no longer in spec 2.0
* bin: Reference sample executables to exercise local run
* swh.deposit.scheduler.cli: Add a scheduling implementation on
deposit
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Mon, 09 Oct 2017 10:23:48 +0200
swh-deposit (0.0.14-1~swh1) unstable-swh; urgency=medium
* v0.0.14
* swh.deposit.tests: Add missing test cases scenario about updates
* docs: Improve and make the documentation browsable through browser
* docs: Add README-sys, README-getting-started, README-injection
* swh.deposit.create_user: Fix collection setup
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Thu, 28 Sep 2017 16:56:54 +0200
swh-deposit (0.0.13-1~swh1) unstable-swh; urgency=medium
* v0.0.13
* swh.deposit.api: Restrict access to one's own collection
* swh.deposit.api: Unify checks on all endpoints
* swh.deposit: Separate the collection from the client notion
* swh.deposit.api: Add delete deposit endpoint
* swh.deposit.api: Add delete content (archives) from deposit
* swh.deposit.api: Empty post on EDIT-IRI can finalize a deposit
* swh.deposit.model: Relax unicity constraint on external id
* swh.deposit.api: PUT does not permit to have the deposit_id None
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Mon, 25 Sep 2017 18:02:46 +0200
swh-deposit (0.0.12-1~swh1) unstable-swh; urgency=medium
* v0.0.12
* swh.deposit.api: Separate the replace metadata from the replace
* archive routine
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Fri, 22 Sep 2017 18:59:48 +0200
swh-deposit (0.0.11-1~swh1) unstable-swh; urgency=medium
* v0.0.11
* swh.deposit.static: Add static folder
* swh.deposit.api: Accept modifications to deposit only in partial
status
* swh.deposit.api: Update/Add new deposit metadata/archive
* swh.deposit.api.tests: Tests new use cases
* README: Update and simplify documentation
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Fri, 22 Sep 2017 16:24:25 +0200
swh-deposit (0.0.10-1~swh1) unstable-swh; urgency=medium
* v0.0.10
* swh.deposit.config: Centralize default config in .config module
* swh.deposit.api: Split api module definition
* swh.deposit.auth: Improve white-listing mechanism
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Thu, 21 Sep 2017 10:41:04 +0200
swh-deposit (0.0.9-1~swh1) unstable-swh; urgency=medium
* v0.0.9
* swh.deposit.api: white list / from authentication
* swh.deposit.api: Update state iri endpoint
* swh.deposit.api: Update new IRI endpoints to deal with update
* swh.deposit.api.deposit: Clean up dead code
* README: Update specification on IRIs
* swh.deposit.urls: Fix endpoints to finish with trailing /
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Wed, 20 Sep 2017 18:12:33 +0200
swh-deposit (0.0.8-1~swh1) unstable-swh; urgency=medium
* v0.0.8
* swh.deposit.settings: Split logging configuration per platform
* swh.deposit.settings.common: Prefer configuration over code
* swh.deposit.api: Enforce basic authentication
* swh.deposit: Clarify SWHDefaultConfig class's intent
* clean up: Removing user test api endpoints
* doc: Update docstrings
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Tue, 19 Sep 2017 14:23:05 +0200
swh-deposit (0.0.7-1~swh1) unstable-swh; urgency=medium
* v0.0.7
* Add swh.deposit.create_user routine to setup user information
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Tue, 19 Sep 2017 10:44:59 +0200
swh-deposit (0.0.6-1~swh1) unstable-swh; urgency=medium
* v0.0.6
* Make the packages include all that's needed (templates, fixtures)
* Fix typo in production settings file
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Mon, 18 Sep 2017 16:17:53 +0200
swh-deposit (0.0.5-1~swh1) unstable-swh; urgency=medium
* v0.0.5
* Package and add missing python3-djangorestframework-xml
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Mon, 18 Sep 2017 15:10:01 +0200
swh-deposit (0.0.4-1~swh1) unstable-swh; urgency=medium
* v0.0.4
* README: Update spec documentation according to latest development
* README-dev: Initiate a development readme to explicit the local
* dev/production mode
* swh.deposit.settings: Split profile configuration per deployment
* platform (dev, production)
* swh.deposit.views:
* Adapt returned errors to be sword compliant
* Change starting api route endpoints
* Improve deposit request headers checks
* swh.deposit.tests:
* Inhibit side-effects in tests (objstorage, configuration loading,
etc...)
* Add authentication in tests
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Sat, 16 Sep 2017 15:38:44 +0200
swh-deposit (0.0.3-1~swh1) unstable-swh; urgency=medium
* v0.0.3
* Migrate to django framework
* Deployment tryouts
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Tue, 01 Aug 2017 12:36:22 +0200
swh-deposit (0.0.2-1~swh1) unstable-swh; urgency=medium
* v0.0.2
* Fix db connection initialization
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Wed, 14 Jun 2017 13:49:31 +0200
swh-deposit (0.0.1-1~swh1) unstable-swh; urgency=medium
* Initial release
* v0.0.1
* Add basic server implementation for deployment testing
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Wed, 14 Jun 2017 12:48:37 +0200
diff --git a/docs/Makefile b/docs/Makefile
index 42355755..ac10d52b 100644
--- a/docs/Makefile
+++ b/docs/Makefile
@@ -1,3 +1,15 @@
include ../../swh-docs/Makefile.sphinx
APIDOC_EXCLUDES += ../swh/*/settings/*
+
+sphinx/html: images
+sphinx/clean: clean-images
+
+images:
+ make -C images/
+clean-images:
+ make -C images/ clean
+
+clean: clean-images
+
+.PHONY: images clean-images
diff --git a/docs/README.rst b/docs/README.rst
index 602e7f61..d934e7c0 100644
--- a/docs/README.rst
+++ b/docs/README.rst
@@ -1,71 +1,71 @@
Software Heritage - Deposit
===========================
Simple Web-Service Offering Repository Deposit (S.W.O.R.D) is an interoperability
standard for digital file deposit.
This repository is both the `SWORD v2`_ Server and a deposit command-line client
implementations.
This implementation allows interaction between a client (a repository) and a server (SWH
repository) to deposit software source code archives and associated metadata.
Description
-----------
Most of the software source code artifacts present in the SWH Archive are gathered by
the mean of :term:`loader <loader>` workers run by the SWH project from sourve code
origins identified by :term:`lister <lister>` workers. This is a pull mechanism: it's
the responsibility of the SWH project to gather and collect source code artifacts that
way.
Alternatively, SWH allows its partners to push source code artifacts and metadata
directly into the Archive with a push-based mechanism. By using this possibility
different actors, holding software artifacts or metadata, can preserve their assets
without having to pass through an intermediate collaborative development platform, which
is already harvested by SWH (e.g GitHub, Gitlab, etc.).
This mechanism is the `deposit`.
The main idea is the deposit is an authenticated access to an API allowing the user to
provide source code artifacts -- with metadata -- to be ingested in the SWH Archive. The
result of that is a :ref:`SWHID <persistent-identifiers>` that can be used to uniquely
and persistently identify that very piece of source code.
This unique identifier can then be used to `reference the source code
<https://hal.archives-ouvertes.fr/hal-02446202>`_ (e.g. in a `scientific paper
<https://www.softwareheritage.org/2020/05/26/citing-software-with-style/>`_) and
retrieve it using the :ref:`vault <swh-vault>` feature of the SWH Archive platform.
The differences between a piece of code uploaded using the deposit rather than simply
asking SWH to archive a repository using the `save code now
<https://archive.softwareheritage.org/save/>`_ feature are:
- a deposited artifact is provided from one of the SWH partners which is regarded as a
trusted authority,
- a deposited artifact requires metadata properties describing the source code artifact,
- a deposited artifact has a codemeta_ metadata entry attached to it,
- a deposited artifact has the same visibility on the SWH Archive than a collected
repository,
- a deposited artifact can be searched with its provided url property on the SWH
Archive,
- the deposit API uses the `SWORD v2`_ API, thus requires some tooling to send deposits
to SWH. These tools are provided with this repository.
See the :ref:`deposit-user-manual` page for more details on how to use the deposit client
command line tools to push a deposit in the SWH Archive.
See the :ref:`deposit-api-specifications` reference pages of the SWORDv2 API implementation
in `swh.deposit` if you want to do upload deposits using HTTP requests.
-Read the :ref:`metadata` chapter to get more details on what metadata are supported when
-doing a deposit.
+Read the :ref:`deposit-metadata` chapter to get more details on what metadata
+are supported when doing a deposit.
See :ref:`swh-deposit-dev-env` if you want to hack the code of the `swh.deposit` module.
See :ref:`swh-deposit-prod-env` if you want to deploy your own copy of the
`swh.deposit` stack.
.. _codemeta: https://codemeta.github.io/
.. _`SWORD v2`: http://swordapp.org/sword-v2/
diff --git a/docs/api/api-documentation.rst b/docs/api/api-documentation.rst
index a6d9ad55..5c4f58da 100644
--- a/docs/api/api-documentation.rst
+++ b/docs/api/api-documentation.rst
@@ -1,115 +1,111 @@
.. _deposit-api-specifications:
API Documentation
=================
This is `Software Heritage <https://www.softwareheritage.org>`__'s
`SWORD
2.0 <http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html>`__
Server implementation.
**S.W.O.R.D** (**S**\ imple **W**\ eb-Service **O**\ ffering
**R**\ epository **D**\ eposit) is an interoperability standard for
digital file deposit.
This implementation will permit interaction between a client (a repository) and
a server (SWH repository) to push deposits of software source code archives
with associated metadata.
*Note:*
* In the following document, we will use the ``archive`` or ``software source
code archive`` interchangeably.
* The supported archive formats are:
* zip: common zip archive (no multi-disk zip files).
* tar: tar archive without compression or optionally any of the following
compression algorithm gzip (.tar.gz, .tgz), bzip2 (.tar.bz2) , or lzma
(.tar.lzma)
.. _swh-deposit-collection:
Collection
----------
SWORD defines a ``collection`` concept. In SWH's case, this collection
refers to a group of deposits. A ``deposit`` is some form of software
source code archive(s) associated with metadata.
By default the client's collection will have the client's name.
Limitations
-----------
* upload limitation of 100Mib
* no mediation
API overview
------------
API access is over HTTPS.
The API is protected through basic authentication.
Endpoints
---------
The API endpoints are rooted at https://deposit.softwareheritage.org/1/.
Data is sent and received as XML (as specified in the SWORD 2.0
specification).
-.. include:: endpoints/service-document.rst
-
-.. include:: endpoints/collection.rst
-
-.. include:: endpoints/update-media.rst
-
-.. include:: endpoints/update-metadata.rst
-
-.. include:: endpoints/status.rst
-
-.. include:: endpoints/content.rst
+.. toctree::
+ ../endpoints/service-document.rst
+ ../endpoints/collection.rst
+ ../endpoints/update-media.rst
+ ../endpoints/update-metadata.rst
+ ../endpoints/status.rst
+ ../endpoints/content.rst
Possible errors:
----------------
* common errors:
* :http:statuscode:`401`:if a client does not provide credential or provide
wrong ones
* :http:statuscode:`403` a client tries access to a collection it does not own
* :http:statuscode:`404` if a client tries access to an unknown collection
* :http:statuscode:`404` if a client tries access to an unknown deposit
* :http:statuscode:`415` if a wrong media type is provided to the endpoint
* archive/binary deposit:
* :http:statuscode:`403` the length of the archive exceeds the max size
configured
* :http:statuscode:`412` the length or hash provided mismatch the reality of
the archive.
* :http:statuscode:`415` if a wrong media type is provided
* multipart deposit:
* :http:statuscode:`412` the md5 hash provided mismatch the reality of the
archive
* :http:statuscode:`415` if a wrong media type is provided
* Atom entry deposit:
* :http:statuscode:`400` if the request's body is empty (for creation only)
Sources
-------
* `SWORD v2 specification
<http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html>`__
* `arxiv documentation <https://arxiv.org/help/submit_sword>`__
* `Dataverse example <http://guides.dataverse.org/en/4.3/api/sword.html>`__
* `SWORD used on HAL <https://api.archives-ouvertes.fr/docs/sword>`__
* `xml examples for CCSD <https://github.com/CCSDForge/HAL/tree/master/Sword>`__
diff --git a/docs/api/use-cases.rst b/docs/api/use-cases.rst
index 7bfb83b0..c71f6c64 100644
--- a/docs/api/use-cases.rst
+++ b/docs/api/use-cases.rst
@@ -1,196 +1,247 @@
.. _deposit-use-cases:
Use cases
=========
The general idea is that a deposit can be created either in a single request
or by multiple requests to allow the user to add elements to the deposit piece
by piece (be it the deposited data or the metadata describing it).
An update request that does not have the `In-Progress: true` HTTP header will
de facto declare the deposit as *completed* (aka in the `deposited` status; see
below) and thus ready for ingestion.
Once the deposit is declared *complete* by the user, the server performs a few
validation checks. Then, if valid, schedule the ingestion of the deposited data
in the Software Heritage Archive (SWH).
There is a `status` property attached to a deposit allowing to follow the
processing workflow of the deposit. For example, when this ingestion task
completes successfully, the deposit is marked as `done`.
Possible deposit statuses are:
partial
The deposit is partially received, since it can be done in
multiple requests.
expired
Deposit was there too long and is new deemed ready to be
garbage-collected.
deposited
Deposit is complete, ready to be checked.
rejected
Deposit failed the checks.
verified
Deposit passed the checks and is ready for loading.
loading
Injection is ongoing on SWH's side.
done
Loading is successful.
failed
Loading failed.
+.. figure:: ../images/status.svg
+ :alt:
This document describes the possible scenarios for creating or updating a
deposit.
Deposit creation
----------------
From client's deposit repository server to SWH's repository server:
1. The client requests for the server's abilities and its associated
:ref:`collections <swh-deposit-collection>` using the *SD/service document uri*
(:http:get:`/1/servicedocument/`).
2. The server answers the client with the service document which lists the
*collections* linked to the user account (most of the time, there will one and
only one collection linked to the user's account). Each of these collection can
be used to push a deposit via its *COL/collection IRI*.
3. The client sends a deposit (a zip archive, some metadata or both) through
the *COL/collection uri*.
This can be done in:
* one POST request (metadata + archive) without the `In-Progress: true` header:
- :http:post:`/1/(str:collection-name)/`
* one POST request (metadata or archive) **with** `In-Progress: true` header:
- :http:post:`/1/(str:collection-name)/`
plus one or more PUT or POST requests *to the update uris*
(*edit-media iri* or *edit iri*):
- :http:post:`/1/(str:collection-name)/(int:deposit-id)/media/`
- :http:put:`/1/(str:collection-name)/(int:deposit-id)/media/`
- :http:post:`/1/(str:collection-name)/(int:deposit-id)/metadata/`
- :http:put:`/1/(str:collection-name)/(int:deposit-id)/metadata/`
Then:
a. Server validates the client's input or returns detailed error if any.
b. Server stores information received (metadata or software archive source
code or both).
-4. The server notifies the client it acknowledged the client's request. An
+4. The server creates a loading task and submits it to the
+ :ref:`Job Scheduler <swh-scheduler>`
+
+5. The server notifies the client it acknowledged the client's request. An
``http 201 Created`` response with a deposit receipt in the body response is
sent back. That deposit receipt will hold the necessary information to
eventually complete the deposit later on if it was incomplete (also known as
status ``partial``).
Schema representation
^^^^^^^^^^^^^^^^^^^^^
Scenario: pushing a deposit via the SWORDv2_ protocol (nominal scenario):
.. figure:: ../images/deposit-create-chart.svg
:alt:
-Updating an existing deposit
-""""""""""""""""""""""""""""
+Deposit update
+--------------
-5. Client updates existing deposit through the *update uris* (one or more POST
+6. Client updates existing deposit through the *update uris* (one or more POST
or PUT requests to either the *edit-media iri* or *edit iri*).
1. Server validates the client's input or returns detailed error if any
2. Server stores information received (metadata or software archive source
code or both)
This would be the case for example if the client initially posted a
``partial`` deposit (e.g. only metadata with no archive, or an archive
without metadata, or a split archive because the initial one exceeded
the limit size imposed by swh repository deposit).
The content of a deposit can only be updated while it is in the ``partial``
state; this causes the content to be **replaced** (the old version is discarded).
-Its metadata, however, can also be updated while in the ``done`` state;
-which adds a new version of the metadata in the SWH archive,
-**in addition to** the old one(s).
-In this state, ``In-Progress`` is not allowed, so the deposit cannot go back
-in the ``partial`` state, but only to ``deposited``.
-As a failsafe, to avoid accidentally updating the wrong deposit, this requires
-the ``X-Check-SWHID`` HTTP header to be set to the value of the SWHID of the
-deposit's content (returned after the deposit finished loading).
+Its metadata, however, can also be updated while in the ``done`` state; see below.
Schema representation
-"""""""""""""""""""""
+^^^^^^^^^^^^^^^^^^^^^
Scenario: updating a deposit via SWORDv2_ protocol:
.. figure:: ../images/deposit-update-chart.svg
:alt:
-Deleting deposit (or associated archive, or associated metadata)
-""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
+Deposit deletion (or associated archive, or associated metadata)
+----------------------------------------------------------------
-6. Deposit deletion is possible as long as the deposit is still in ``partial``
+7. Deposit deletion is possible as long as the deposit is still in ``partial``
state.
1. Server validates the client's input or returns detailed error if any
2. Server actually delete information according to request
Schema representation
^^^^^^^^^^^^^^^^^^^^^
Scenario: deleting a deposit via SWORDv2_ protocol:
.. figure:: ../images/deposit-delete-chart.svg
:alt:
Client asks for operation status
-""""""""""""""""""""""""""""""""
+--------------------------------
+
+At any time during the next step, operation status can be read through
+a GET query to the *state iri*.
+
+
+Deposit loading
+---------------
-7. Operation status can be read through a GET query to the *state iri*.
+In one of the previous steps, when a deposit was created or loaded without
+``In-Progress: true``, the deposit server created a load task and submitted it
+to :ref:`swh-scheduler <swh-scheduler>`.
+
+This triggers the following steps:
Server: Triggering deposit checks
-"""""""""""""""""""""""""""""""""
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Once the status ``deposited`` is reached for a deposit, checks for the
associated archive(s) and metadata will be triggered. If those checks
fail, the status is changed to ``rejected`` and nothing more happens
there. Otherwise, the status is changed to ``verified``.
Server: Triggering deposit load
-"""""""""""""""""""""""""""""""
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Once the status ``verified`` is reached for a deposit, loading the
deposit with its associated metadata will be triggered.
The loading will result on status update, either ``done`` or ``failed``
(depending on the loading's status).
This is described in the :ref:`loading specifications document <swh-loading-specs>`.
+
+Completing the deposit
+----------------------
+
+When this is all done, the loaders notify the deposit server, which sets
+the deposit status to ``done``.
+
+This can then be polled by deposit clients, using the *state iri*.
+
+
+Deposit metadata updates
+------------------------
+
+We saw earlier that a deposit can only be updated when in ``partial`` state.
+
+This is one exception to this rule: its metadata can be updated while in the
+``done`` state; which adds a new version of the metadata in the SWH archive,
+**in addition to** the old one(s).
+In this state, ``In-Progress`` is not allowed, so the deposit cannot go back
+in the ``partial`` state, but only to ``deposited``.
+As a failsafe, to avoid accidentally updating the wrong deposit, this requires
+the ``X-Check-SWHID`` HTTP header to be set to the value of the SWHID of the
+deposit's content (returned after the deposit finished loading).
+
+.. _use-case-metadata-only-deposit:
+
+Metadata-only deposit
+---------------------
+
+Finally, as an extension to the SWORD protocol, swh-deposit allows a special
+type of deposit: metadata-only deposits.
+Unlike regular deposit (described above), they do not have a code archive.
+Instead, they describe an existing :term:`software artifact` present in the
+archive.
+
+This use case is triggered by a ``<reference>`` tag in the Atom document,
+see the :ref:`protocol reference <metadata-only-deposit>` for details.
+
+In the current implementation, these deposits are loaded (or rejected)
+immediately after a request without ``In-Progress: true`` is made,
+ie. they skip the ``loading`` state. This may change in a future version.
+
.. _SWORDv2: http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html
diff --git a/docs/api/user-manual.rst b/docs/api/user-manual.rst
index cbae44a1..e2bacf04 100644
--- a/docs/api/user-manual.rst
+++ b/docs/api/user-manual.rst
@@ -1,486 +1,486 @@
.. _deposit-user-manual:
User Manual
===========
This is a guide for how to prepare and push a software deposit with
the `swh deposit` commands.
Requirements
------------
You need to have an account on the Software Heritage deposit application to be
able to use the service.
Please `contact the Software Heritage team <deposit@softwareheritage.org>`_ for
more information on how to get access to this service.
For testing purpose, a test instance `is available
<https://deposit.staging.swh.network>`_ [#f1]_ and will be used in the examples below.
Once you have an account, you should get a set of access credentials as a
`login` and a `password` (identified as ``<name>`` and ``<pass>`` in the
remaining of this document). A deposit account also comes with a "provider URL"
which is used by SWH to build the :term:`Origin URL<origin>` of deposits
created using this account.
Installation
------------
To install the `swh.deposit` command line tools, you need a working Python 3.7+
environment. It is strongly recommended you use a `virtualenv
<https://virtualenv.pypa.io/en/stable/>`_ for this.
.. code:: console
$ python3 -m virtualenv deposit
[...]
$ source deposit/bin/activate
(deposit)$ pip install swh.deposit
[...]
(deposit)$ swh deposit --help
Usage: swh deposit [OPTIONS] COMMAND [ARGS]...
Deposit main command
Options:
-h, --help Show this message and exit.
Commands:
admin Server administration tasks (manipulate user or...
status Deposit's status
upload Software Heritage Public Deposit Client Create/Update...
(deposit)$
Note: in the examples below, we use the `jq`_ tool to make json outputs nicer.
If you do have it already, you may install it using your distribution's
packaging system. For example, on a Debian system:
.. _jq: https://stedolan.github.io/jq/
.. code:: console
$ sudo apt install jq
-.. _prepare_deposit
+.. _prepare-deposit:
Prepare a deposit
-----------------
* compress the files in a supported archive format:
- zip: common zip archive (no multi-disk zip files).
- tar: tar archive without compression or optionally any of the
following compression algorithm gzip (`.tar.gz`, `.tgz`), bzip2
(`.tar.bz2`) , or lzma (`.tar.lzma`)
* (Optional) prepare a metadata file (more details :ref:`deposit-metadata`):
Example:
Assuming you want to deposit the source code of `belenios
<https://gitlab.inria.fr/belenios/belenios>`_ version 1.12
.. code:: console
(deposit)$ wget https://gitlab.inria.fr/belenios/belenios/-/archive/1.12/belenios-1.12.zip
[...]
2020-10-28 11:40:37 (4,56 MB/s) - ‘belenios-1.12.zip’ saved [449880/449880]
(deposit)$
Then you need to prepare a metadata file allowing you to give detailed
information on your deposited source code. A rather minimal Atom with Codemeta
file could be:
.. code:: console
(deposit)$ cat metadata.xml
<?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://www.w3.org/2005/Atom"
xmlns:codemeta="https://doi.org/10.5063/SCHEMA/CODEMETA-2.0"
xmlns:swh="https://www.softwareheritage.org/schema/2018/deposit">
<title>Verifiable online voting system</title>
<id>belenios-01243065</id>
<codemeta:url>https://gitlab.inria.fr/belenios/belenios</codemeta:url>
<codemeta:applicationCategory>test</codemeta:applicationCategory>
<codemeta:keywords>Online voting</codemeta:keywords>
<codemeta:description>Verifiable online voting system</codemeta:description>
<codemeta:version>1.12</codemeta:version>
<codemeta:runtimePlatform>opam</codemeta:runtimePlatform>
<codemeta:developmentStatus>stable</codemeta:developmentStatus>
<codemeta:programmingLanguage>ocaml</codemeta:programmingLanguage>
<codemeta:license>
<codemeta:name>GNU Affero General Public License</codemeta:name>
</codemeta:license>
<author>
<name>Belenios</name>
<email>belenios@example.com</email>
</author>
<codemeta:author>
<codemeta:name>Belenios Test User</codemeta:name>
</codemeta:author>
<swh:deposit>
<swh:create_origin>
<swh:origin url="http://has.archives-ouvertes.fr/test-01243065" />
</swh:create_origin>
</swh:deposit>
</entry>
(deposit)$
Please read the :ref:`deposit-metadata` page for a more detailed view on the
metadata file formats and semantics.
Push a deposit
--------------
You can push a deposit with:
* a single deposit (archive + metadata):
The user posts in one query a software
source code archive and associated metadata.
The deposit is directly marked with status ``deposited``.
* a multisteps deposit:
1. Create an incomplete deposit (marked with status ``partial``)
2. Add data to a deposit (in multiple requests if needed)
3. Finalize deposit (the status becomes ``deposited``)
* a metadata-only deposit:
The user posts in one query an associated metadata file on a :ref:`SWHID
<persistent-identifiers>` object. The deposit is directly marked with status
``done``.
Overall, a deposit can be a in series of steps as follow:
-.. figure:: images/status.svg
+.. figure:: ../images/status.svg
:alt:
The important things to notice for now is that it can be:
partial:
the deposit is partially received
expired:
deposit has been there too long and is now deemed
ready to be garbage collected
deposited:
deposit is complete and is ready to be checked to ensure data consistency
verified:
deposit is fully received, checked, and ready for loading
loading:
loading is ongoing on swh's side
done:
loading is successful
failed:
loading is a failure
When you push a deposit, it is either in the `deposited` state or in the
`partial` state if you asked for a partial upload.
Single deposit
^^^^^^^^^^^^^^
Once the files are ready for deposit, we want to do the actual deposit in one
shot, i.e. sending both the archive (zip) file and the metadata file.
* 1 archive (content-type ``application/zip`` or ``application/x-tar``)
* 1 metadata file in atom xml format (``content-type: application/atom+xml;type=entry``)
For this, we need to provide the:
* arguments: ``--username 'name' --password 'pass'`` as credentials
* archive's path (example: ``--archive path/to/archive-name.tgz``)
* metadata file path (example: ``--metadata path/to/metadata.xml``)
to the `swh deposit upload` command.
Example:
To push the Belenios 1.12 we prepared previously on the testing instance of the
deposit:
.. code:: console
(deposit)$ ls
belenios-1.12.zip metadata.xml deposit
(deposit)$ swh deposit upload --username <name> --password <secret> \
--url https://deposit.staging.swh.network/1 \
--slug belenios-01243065 \
--archive belenios.zip \
--metadata metadata.xml \
--format json | jq
{
'deposit_status': 'deposited',
'deposit_id': '1',
'deposit_date': 'Oct. 28, 2020, 1:52 p.m.',
'deposit_status_detail': None
}
(deposit)$
You just posted a deposit to your main collection on Software Heritage (staging
area)!
The returned value is a JSON dict, in which you will notably find the deposit
id (needed to check for its status later on) and the current status, which
should be `deposited` if no error has occurred.
Note: As the deposit is in ``deposited`` status, you can no longer
update the deposit after this query. It will be answered with a 403
(Forbidden) answer.
If something went wrong, an equivalent response will be given with the
`error` and `detail` keys explaining the issue, e.g.:
.. code:: console
{
'error': 'Unknown collection name xyz',
'detail': None,
'deposit_status': None,
'deposit_status_detail': None,
'deposit_swh_id': None,
'status': 404
}
Once the deposit has been done, you can check its status using the `swh deposit
status` command:
.. code:: console
(deposit)$ swh deposit status --username <name> --password <secret> \
--url https://deposit.staging.swh.network/1 \
--deposit-id 1 -f json | jq
{
"deposit_id": "1",
"deposit_status": "done",
"deposit_status_detail": "The deposit has been successfully loaded into the Software Heritage archive",
"deposit_swh_id": "swh:1:dir:63a6fc0ed8f69bf66ccbf99fc0472e30ef0a895a",
"deposit_swh_id_context": "swh:1:dir:63a6fc0ed8f69bf66ccbf99fc0472e30ef0a895a;origin=https://softwareheritage.org/belenios-01234065;visit=swh:1:snp:0ae536667689da7047bfb7aa9f37f5958e9f4647;anchor=swh:1:rev:17ad98c940104d45b6b6bd6fba9aa832eeb95638;path=/",
"deposit_external_id": "belenios-01234065"
}
Metadata-only deposit
^^^^^^^^^^^^^^^^^^^^^
This allows to deposit only metadata information on a :ref:`SWHID reference
<persistent-identifiers>`. Prepare a metadata file as described in the
:ref:`prepare deposit section <prepare-deposit>`
Ensure this metadata file also declares a :ref:`SWHID reference
<persistent-identifiers>`:
.. code:: xml
- <entry ...
+ <entry xmlns="..."
xmlns:swh="https://www.softwareheritage.org/schema/2018/deposit"
>
<!-- ... -->
<swh:deposit>
<swh:reference>
<swh:object swhid="swh:1:dir:31b5c8cc985d190b5a7ef4878128ebfdc2358f49" />
</swh:reference>
</swh:deposit>
<!-- ... -->
</entry>
For this, we then need to provide the following information:
* arguments: ``--username 'name' --password 'pass'`` as credentials
* metadata file path (example: ``--metadata path/to/metadata.xml``)
to the `swh deposit metadata-only` command.
Example:
.. code:: console
(deposit) swh deposit metadata-only --username <name> --password <secret> \
--url https://deposit.staging.swh.network/1 \
--metadata ../deposit-swh.metadata-only.xml \
--format json | jq .
{
"deposit_id": "29",
"deposit_status": "done",
"deposit_date": "Dec. 15, 2020, 11:37 a.m."
}
For details on the metadata-only deposit, see the
:ref:`metadata-only deposit protocol reference <metadata-only-deposit>`
Multisteps deposit
^^^^^^^^^^^^^^^^^^
In this case, the deposit is created by several requests, uploading objects
piece by piece. The steps to create a multisteps deposit:
1. Create an partial deposit
""""""""""""""""""""""""""""
First use the ``--partial`` argument to declare there is more to come
.. code:: console
$ swh deposit upload --username name --password secret \
--archive foo.tar.gz \
--partial
2. Add content or metadata to the deposit
"""""""""""""""""""""""""""""""""""""""""
Continue the deposit by using the ``--deposit-id`` argument given as a response
for the first step. You can continue adding content or metadata while you use
the ``--partial`` argument.
To only add one new archive to the deposit:
.. code:: console
$ swh deposit upload --username name --password secret \
--archive add-foo.tar.gz \
--deposit-id 42 \
--partial
To only add metadata to the deposit:
.. code:: console
$ swh deposit upload --username name --password secret \
--metadata add-foo.tar.gz.metadata.xml \
--deposit-id 42 \
--partial
3. Finalize deposit
"""""""""""""""""""
On your last addition (same command as before), by not declaring it
``--partial``, the deposit will be considered completed. Its status will be
changed to ``deposited``:
.. code:: console
$ swh deposit upload --username name --password secret \
--metadata add-foo.tar.gz.metadata.xml \
--deposit-id 42
Update deposit
--------------
* Update deposit metadata:
- only possible if the deposit status is ``done``, ``--deposit-id <id>`` and
``--swhid <swhid>`` are provided
- by using the ``--metadata`` flag, a path to an xml file
.. code:: console
$ swh deposit upload \
--username name --password secret \
--deposit-id 11 \
--swhid swh:1:dir:2ddb1f0122c57c8479c28ba2fc973d18508e6420 \
--metadata ../deposit-swh.update-metadata.xml
* Replace deposit:
- only possible if the deposit status is ``partial`` and
``--deposit-id <id>`` is provided
- by using the ``--replace`` flag
- ``--metadata-deposit`` replaces associated existing metadata
- ``--archive-deposit`` replaces associated archive(s)
- by default, with no flag or both, you'll replace associated
metadata and archive(s):
.. code:: console
$ swh deposit upload --username name --password secret \
--deposit-id 11 \
--archive updated-je-suis-gpl.tgz \
--replace
* Update a loaded deposit with a new version (this creates a new deposit):
- by using the external-id with the ``--slug`` argument, you will
link the new deposit with its parent deposit:
.. code:: console
$ swh deposit upload --username name --password secret \
--archive je-suis-gpl-v2.tgz \
--slug 'je-suis-gpl'
Check the deposit's status
--------------------------
You can check the status of the deposit by using the ``--deposit-id`` argument:
.. code:: console
$ swh deposit status --username name --password secret \
--deposit-id 11
.. code:: json
{
"deposit_id": 11,
"deposit_status": "deposited",
"deposit_swh_id": null,
"deposit_status_detail": "Deposit is ready for additional checks \
(tarball ok, metadata, etc...)"
}
When the deposit has been loaded into the archive, the status will be
marked ``done``. In the response, will also be available the
<deposit_swh_id>, <deposit_swh_id_context>. For example:
.. code:: json
{
"deposit_id": 11,
"deposit_status": "done",
"deposit_swh_id": "swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9",
"deposit_swh_id_context": "swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9;\
origin=https://forge.softwareheritage.org/source/jesuisgpl/;\
visit=swh:1:snp:68c0d26104d47e278dd6be07ed61fafb561d0d20;\
anchor=swh:1:rev:e76ea49c9ffbb7f73611087ba6e999b19e5d71eb;path=/",
"deposit_status_detail": "The deposit has been successfully \
loaded into the Software Heritage archive"
}
.. rubric:: Footnotes
.. [#f1] the test instance of the deposit is not yet available to external users,
but it should be available soon.
diff --git a/docs/cli.rst b/docs/cli.rst
index d004c79a..ad1e2dd4 100644
--- a/docs/cli.rst
+++ b/docs/cli.rst
@@ -1,35 +1,35 @@
.. _swh-deposit-cli:
Command-line interface
======================
Shared command-line interface
-----------------------------
.. click:: swh.deposit.cli:deposit
:prog: swh deposit
:nested: short
Administration utilities
------------------------
.. click:: swh.deposit.cli.admin:admin
:prog: swh deposit admin
:nested: full
.. _swh-deposit-cli-client:
Deposit client tools
--------------------
.. click:: swh.deposit.cli.client:upload
:prog: swh deposit
:nested: full
.. click:: swh.deposit.cli.client:status
:prog: swh deposit
:nested: full
-.. click:: swh.deposit.cli.client:metadata-only
+.. click:: swh.deposit.cli.client:metadata_only
:prog: swh deposit
:nested: full
diff --git a/docs/images/deposit-authentication-basic.uml b/docs/images/deposit-authentication-basic.uml
index f00644e2..80e03f80 100644
--- a/docs/images/deposit-authentication-basic.uml
+++ b/docs/images/deposit-authentication-basic.uml
@@ -1,23 +1,23 @@
@startuml
participant CLIENT as "SWORD client\n(eg. HAL)"
participant DEPOSIT as "swh-deposit"
-participant AUTH_BACKEND as "deposit storage"
+participant AUTH_BACKEND as "deposit database"
activate CLIENT
activate DEPOSIT
activate AUTH_BACKEND
CLIENT ->> DEPOSIT: GET /1/<service-document>/
DEPOSIT ->> AUTH_BACKEND: check authentication
alt credentials mismatch or inexistent user
AUTH_BACKEND ->> DEPOSIT: return ko
DEPOSIT -->> CLIENT: return 401, Unauthorized
else credentials ok
AUTH_BACKEND ->> DEPOSIT: return deposit_client
DEPOSIT -->> CLIENT: return 200, <service-document>
end
deactivate CLIENT
deactivate DEPOSIT
deactivate AUTH_BACKEND
@enduml
diff --git a/docs/images/deposit-create-chart.uml b/docs/images/deposit-create-chart.uml
index d7683029..2c565d70 100644
--- a/docs/images/deposit-create-chart.uml
+++ b/docs/images/deposit-create-chart.uml
@@ -1,26 +1,30 @@
@startuml
participant CLIENT as "SWORD client\n(eg. HAL)"
participant DEPOSIT as "swh-deposit"
- participant DEPOSIT_STORAGE as "deposit storage"
+ participant DEPOSIT_DATABASE as "deposit database"
+ participant SCHEDULER as "swh-scheduler"
activate CLIENT
activate DEPOSIT
- activate DEPOSIT_STORAGE
+ activate DEPOSIT_DATABASE
+ activate SCHEDULER
CLIENT ->> DEPOSIT: GET /1/<service-document>/
- DEPOSIT ->> DEPOSIT_STORAGE: check authentication
- DEPOSIT_STORAGE -->> DEPOSIT: return ok (if client exists and credentials ok)
+ DEPOSIT ->> DEPOSIT_DATABASE: check authentication
+ DEPOSIT_DATABASE -->> DEPOSIT: return ok (if client exists and credentials ok)
DEPOSIT -->> CLIENT: return 200, <service-document>
CLIENT ->> DEPOSIT: POST /1/<collection-name>/
- DEPOSIT ->> DEPOSIT_STORAGE: check authentication
- DEPOSIT_STORAGE -->> DEPOSIT: return ok (if client exists and credentials ok)
+ DEPOSIT ->> DEPOSIT_DATABASE: check authentication
+ DEPOSIT_DATABASE -->> DEPOSIT: return ok (if client exists and credentials ok)
- DEPOSIT ->> DEPOSIT_STORAGE: create new deposit
- DEPOSIT_STORAGE -->> DEPOSIT: return deposit_id
+ DEPOSIT ->> DEPOSIT_DATABASE: create new deposit
+ DEPOSIT_DATABASE -->> DEPOSIT: return deposit_id
+
+ DEPOSIT ->> SCHEDULER: schedule load for <deposit_id>
DEPOSIT -->> CLIENT: return 201, <deposit receipt>
@enduml
diff --git a/docs/images/deposit-delete-chart.uml b/docs/images/deposit-delete-chart.uml
index 19727df1..0d5a8521 100644
--- a/docs/images/deposit-delete-chart.uml
+++ b/docs/images/deposit-delete-chart.uml
@@ -1,33 +1,33 @@
@startuml
participant CLIENT as "SWORD client\n(eg. HAL)"
participant DEPOSIT as "swh-deposit"
- participant DEPOSIT_STORAGE as "deposit storage"
+ participant DEPOSIT_DATABASE as "deposit database"
activate CLIENT
activate DEPOSIT
- activate DEPOSIT_STORAGE
+ activate DEPOSIT_DATABASE
CLIENT ->> DEPOSIT: POST /1/<collection-name>/\nHEADER In-Progress: true
- DEPOSIT ->> DEPOSIT_STORAGE: check authentication
- DEPOSIT_STORAGE -->> DEPOSIT: return ok (if client exists and credentials ok)
+ DEPOSIT ->> DEPOSIT_DATABASE: check authentication
+ DEPOSIT_DATABASE -->> DEPOSIT: return ok (if client exists and credentials ok)
DEPOSIT -->> CLIENT: return 201, <deposit receipt>
CLIENT -> DEPOSIT: DELETE /1/<collection-name>/<deposit-id>/media/\nDELETE /1/<collection-name>/<deposit-id>/metadata/
- DEPOSIT ->> DEPOSIT_STORAGE: check authentication
- DEPOSIT_STORAGE -->> DEPOSIT: return ok
+ DEPOSIT ->> DEPOSIT_DATABASE: check authentication
+ DEPOSIT_DATABASE -->> DEPOSIT: return ok
- DEPOSIT ->> DEPOSIT_STORAGE: check inputs()
+ DEPOSIT ->> DEPOSIT_DATABASE: check inputs()
alt status is 'partial'
- DEPOSIT_STORAGE -->> DEPOSIT: return ok
- DEPOSIT ->> DEPOSIT_STORAGE: delete-deposit-or-deposit-archives()
- DEPOSIT_STORAGE -->> DEPOSIT: return ok
+ DEPOSIT_DATABASE -->> DEPOSIT: return ok
+ DEPOSIT ->> DEPOSIT_DATABASE: delete-deposit-or-deposit-archives()
+ DEPOSIT_DATABASE -->> DEPOSIT: return ok
DEPOSIT -->> CLIENT: return 204
else status is not 'partial'
- DEPOSIT_STORAGE -->> DEPOSIT: return ko
+ DEPOSIT_DATABASE -->> DEPOSIT: return ko
DEPOSIT -->> CLIENT: return 400, "You can only act on deposit with status partial"
end
@enduml
diff --git a/docs/images/deposit-update-chart.uml b/docs/images/deposit-update-chart.uml
index e4f49ae7..8340cb38 100644
--- a/docs/images/deposit-update-chart.uml
+++ b/docs/images/deposit-update-chart.uml
@@ -1,39 +1,42 @@
@startuml
participant CLIENT as "SWORD client\n(eg. HAL)"
participant DEPOSIT as "swh-deposit"
- participant DEPOSIT_STORAGE as "deposit storage"
+ participant DEPOSIT_DATABASE as "deposit database"
+ participant SCHEDULER as "swh-scheduler"
activate CLIENT
activate DEPOSIT
- activate DEPOSIT_STORAGE
+ activate DEPOSIT_DATABASE
+ activate SCHEDULER
CLIENT ->> DEPOSIT: POST /1/<collection-name>/\nHEADER In-Progress: true
- DEPOSIT ->> DEPOSIT_STORAGE: check authentication
- DEPOSIT_STORAGE -->> DEPOSIT: return ok (if client exists and credentials ok)
+ DEPOSIT ->> DEPOSIT_DATABASE: check authentication
+ DEPOSIT_DATABASE -->> DEPOSIT: return ok (if client exists and credentials ok)
DEPOSIT -->> CLIENT: return 201, <deposit receipt>
CLIENT -> DEPOSIT: POST/PUT /1/<collection-name>/<deposit-id>/media/\nPOST/PUT /1/<collection-name>/<deposit-id>/metadata/
- DEPOSIT ->> DEPOSIT_STORAGE: check authentication
- DEPOSIT_STORAGE -->> DEPOSIT: return ok
+ DEPOSIT ->> DEPOSIT_DATABASE: check authentication
+ DEPOSIT_DATABASE -->> DEPOSIT: return ok
- DEPOSIT ->> DEPOSIT_STORAGE: check inputs()
+ DEPOSIT ->> DEPOSIT_DATABASE: check inputs()
alt status is 'partial'
- DEPOSIT_STORAGE -->> DEPOSIT: return ok
+ DEPOSIT_DATABASE -->> DEPOSIT: return ok
alt HEADER: In-Progress = true
- DEPOSIT ->> DEPOSIT_STORAGE: add-or-replace-data-and-update-status('partial')
+ DEPOSIT ->> DEPOSIT_DATABASE: add-or-replace-data-and-update-status('partial')
else HEADER: In-Progress = false
- DEPOSIT ->> DEPOSIT_STORAGE: add-or-replace-data-and-update-status('deposited')
+ DEPOSIT ->> SCHEDULER: schedule load for <deposit_id>
+ DEPOSIT ->> DEPOSIT_DATABASE: add-or-replace-data-and-update-status('deposited')
end
- DEPOSIT_STORAGE -->> DEPOSIT: return ok
+ DEPOSIT_DATABASE -->> DEPOSIT: return ok
DEPOSIT -->> CLIENT: return 204
else status is not partial
- DEPOSIT_STORAGE -->> DEPOSIT: return ko
+ DEPOSIT_DATABASE -->> DEPOSIT: return ko
DEPOSIT -->> CLIENT: return 400, "You can only act on deposit with status partial"
end
@enduml
diff --git a/docs/images/deposit-workflow-checking.uml b/docs/images/deposit-workflow-checking.uml
new file mode 100644
index 00000000..7acd846b
--- /dev/null
+++ b/docs/images/deposit-workflow-checking.uml
@@ -0,0 +1,34 @@
+@startuml
+ participant DEPOSIT as "deposit API"
+ participant DEPOSIT_DATABASE as "deposit DB"
+ participant CHECKER_TASK as "checker task"
+ participant CELERY as "celery"
+ participant SCHEDULER as "swh-scheduler"
+
+ activate DEPOSIT
+ activate DEPOSIT_DATABASE
+ activate CELERY
+ activate SCHEDULER
+
+ SCHEDULER ->> CELERY: new "check-deposit"\ntask available
+ CELERY ->> CHECKER_TASK: start task
+ activate CHECKER_TASK
+
+ CHECKER_TASK ->> DEPOSIT: GET /{collection}/{deposit_id}/check/
+
+ DEPOSIT ->> DEPOSIT_DATABASE: get deposit requests
+ DEPOSIT_DATABASE ->> DEPOSIT: deposit requests
+
+ loop for each request
+ DEPOSIT ->> DEPOSIT_DATABASE: get archive
+ DEPOSIT_DATABASE ->> DEPOSIT: archive content
+ DEPOSIT ->> DEPOSIT: check archive in the request
+ end
+
+ DEPOSIT ->> DEPOSIT_DATABASE: mark deposit as "verified"
+ DEPOSIT ->> SCHEDULER: schedule load
+ DEPOSIT ->> CHECKER_TASK: done
+ CHECKER_TASK ->> CELERY: done
+ deactivate CHECKER_TASK
+ CELERY ->> SCHEDULER: done
+@enduml
diff --git a/docs/images/deposit-workflow-loading.uml b/docs/images/deposit-workflow-loading.uml
new file mode 100644
index 00000000..d5f869d9
--- /dev/null
+++ b/docs/images/deposit-workflow-loading.uml
@@ -0,0 +1,44 @@
+@startuml
+ participant DEPOSIT as "deposit API"
+ participant DEPOSIT_DATABASE as "deposit DB"
+ participant LOADER_TASK as "loader task"
+ participant STORAGE as "swh-storage"
+ participant CELERY as "celery"
+ participant SCHEDULER as "swh-scheduler"
+
+ activate DEPOSIT
+ activate DEPOSIT_DATABASE
+ activate STORAGE
+ activate CELERY
+ activate SCHEDULER
+
+ SCHEDULER ->> CELERY: new "load-deposit"\ntask available
+ CELERY ->> LOADER_TASK: start task
+ activate LOADER_TASK
+
+ LOADER_TASK ->> DEPOSIT: GET /{collection}/{deposit_id}/raw/
+
+ DEPOSIT ->> DEPOSIT_DATABASE: get deposit requests
+ DEPOSIT_DATABASE ->> DEPOSIT: deposit requests
+
+ loop for each request
+ DEPOSIT ->> DEPOSIT_DATABASE: get archive
+ DEPOSIT_DATABASE ->> DEPOSIT: archive content
+ DEPOSIT ->> DEPOSIT: aggregate
+ end
+
+ DEPOSIT ->> LOADER_TASK: tarball
+
+ LOADER_TASK ->> LOADER_TASK: unpack on disk
+
+ loop
+ LOADER_TASK ->> LOADER_TASK: load objects
+ LOADER_TASK ->> STORAGE: store objects
+ end
+
+ LOADER_TASK -> DEPOSIT: PUT /{collection}/{deposit_id}/status
+ DEPOSIT ->> DEPOSIT_DATABASE: mark deposit as "done"
+ LOADER_TASK ->> CELERY: done
+ deactivate LOADER_TASK
+ CELERY ->> SCHEDULER: done
+@enduml
diff --git a/docs/images/deposit-workflow-reception.uml b/docs/images/deposit-workflow-reception.uml
new file mode 100644
index 00000000..c38a9a9e
--- /dev/null
+++ b/docs/images/deposit-workflow-reception.uml
@@ -0,0 +1,37 @@
+@startuml
+ participant CLIENT as "SWORD client"
+ participant DEPOSIT as "deposit API"
+ participant DEPOSIT_DATABASE as "deposit DB"
+ participant STORAGE as "swh-storage"
+ participant SCHEDULER as "swh-scheduler"
+
+ activate CLIENT
+ activate DEPOSIT
+ activate DEPOSIT_DATABASE
+ activate STORAGE
+ activate SCHEDULER
+
+ CLIENT ->> DEPOSIT: Atom and/or archive
+ DEPOSIT ->> DEPOSIT_DATABASE: create new deposit
+ DEPOSIT_DATABASE -->> DEPOSIT: return deposit_id
+ DEPOSIT ->> DEPOSIT_DATABASE: record deposit request
+
+ loop while the previous request has "In-Progress: true"
+ DEPOSIT ->> CLIENT: deposit receipt\n("partial")
+ CLIENT ->> DEPOSIT: Atom and/or archive
+ DEPOSIT ->> DEPOSIT_DATABASE: record deposit request
+ end
+
+
+ alt if metadata-only
+ DEPOSIT ->> STORAGE: target exists?
+ STORAGE ->> DEPOSIT: true
+ DEPOSIT ->> STORAGE: insert metadata
+ DEPOSIT ->> DEPOSIT_DATABASE: mark deposit as "done"
+ else
+ DEPOSIT ->> SCHEDULER: schedule checks
+ DEPOSIT ->> DEPOSIT_DATABASE: mark deposit as "loading"
+ end
+
+ DEPOSIT ->> CLIENT: deposit receipt\n("done" or "loading")
+@enduml
diff --git a/docs/internals/authentication.rst b/docs/internals/authentication.rst
index f50ae834..e17f6ac1 100644
--- a/docs/internals/authentication.rst
+++ b/docs/internals/authentication.rst
@@ -1,44 +1,44 @@
.. _authentication:
Authentication
==============
This is a description of the authentication mechanism used in the deposit server. Both
`basic authentication <https://tools.ietf.org/html/rfc7617>`_ and `keycloak`_ schemes
are supported through configuration.
Basic
-----
The first implementation uses `basic authentication
-<https://tools.ietf.org/html/rfc7617>`_. The deposit storage backend has the
-responsibility to check the authentication credentials sent by the deposit client. If
+<https://tools.ietf.org/html/rfc7617>`_. The deposit server checks
+the authentication credentials sent by the deposit client using its own database. If
authorized, the deposit client is allowed to continue its deposit. Otherwise, a 401
response is returned to the client.
-.. figure:: images/deposit-authentication-basic.svg
+.. figure:: ../images/deposit-authentication-basic.svg
:alt: Basic Authentication
Keycloak
--------
Recent changes introduced `keycloak`_, an Open Source Identity and Access Management
tool which is already used in other parts of the swh stack.
The authentication is delegated to the `swh keycloak instance
<https://auth.softwareheritage.org/auth/>`_ using the `Resource Owner Password
Credentials <https://tools.ietf.org/html/rfc6749#section-1.3.3>`_ scheme.
Deposit clients still uses the deposit as before. Transparently for them, the deposit
server forwards their credentials to keycloak for validation. If `keycloak`_ authorizes
the deposit client, the deposit further checks that the deposit client has the proper
permission "swh.deposit.api". If they do, they can post their deposits.
If any issue arises during one of the authentication check, the client receives a 401
response (unauthorized).
-.. figure:: images/deposit-authentication-keycloak.svg
+.. figure:: ../images/deposit-authentication-keycloak.svg
:alt: Keycloak Authentication
.. _keycloak: https://www.keycloak.org/
diff --git a/docs/internals/index.rst b/docs/internals/index.rst
index 5b0affce..a3350fd9 100644
--- a/docs/internals/index.rst
+++ b/docs/internals/index.rst
@@ -1,14 +1,15 @@
.. _swh-deposit-internals:
Deposit internals
=================
This chapter describes how swh-deposit works internally,
and how to run it (either in production or locally for development).
.. toctree::
:maxdepth: 1
dev-environment
prod-environment
authentication
+ loading-workflow
diff --git a/docs/internals/loading-workflow.rst b/docs/internals/loading-workflow.rst
new file mode 100644
index 00000000..b4fff2d0
--- /dev/null
+++ b/docs/internals/loading-workflow.rst
@@ -0,0 +1,91 @@
+Loading workflow
+================
+
+This section complements the :ref:`deposit-use-cases` documentation,
+by detailing how deposits are handled internally after clients deposited them.
+
+Reception
+---------
+
+For every HTTP request sent by a client, the deposit API checks some simple properties,
+then creates a :class:`swh.deposit.models.DepositRequest`
+object containing the data uploaded by the client verbatim (archive and/or metadata),
+and inserts in the database
+A corresponding :class:`swh.deposit.models.Deposit` object is also created
+and inserted, if this is the initial request creating a deposit.
+
+Upon receiving the last request, identified by the lack of the ``In-Progress: true``
+header, the deposit server either:
+
+* checks the targeting objects exists in :ref:`swh-storage <swh-storage>`,
+ then sends a request to swh-storage with the Atom metadata and updates the
+ deposit status to ``done``,
+ if it is a :ref:`metadata-only deposit <use-case-metadata-only-deposit>`
+* updates the deposit status and schedules a checking task by querying
+ :ref:`swh-scheduler <swh-scheduler>`, otherwise
+
+Graphically:
+
+.. figure:: ../images/deposit-workflow-reception.svg
+ :alt:
+
+For metadata-only deposits, this is the end of the story.
+The next section narrates what happens next for "normal" deposits.
+
+Checking
+--------
+
+As we saw above, the deposit API server's synchronous work ends after sending
+a checking task.
+This task is implemented by :class:`swh.deposit.loader.checker.DepositChecker`;
+which is simply an other call to the deposit API,
+implemented in :class:`swh.deposit.api.private.deposit_check.APIChecks`.
+
+This API performs longer checks, which require inspecting the deposited archive
+(or archives, for clients depositing archives in multiple steps).
+This is why it is run by an asynchronous task instead of being checked immediately
+when the client sent a query.
+
+When it is done, it sets the deposit's status to "verified" (so clients polling
+for the status know this step succeeded) and schedule a loading task.
+
+Graphically:
+
+.. figure:: ../images/deposit-workflow-checking.svg
+ :alt:
+
+Note that the check task is actually just a thin wrapper around an API call.
+While the checks could be done in the task itself, it would mean sending
+all archives from the deposit API to the celery worker, which would be inefficient.
+And the gains would not be great, as checking tasks only need to decompress archives,
+which is not resource intensive.
+Instead, this long-running call to the API proved to be a simpler
+and more efficient solution at the current scale of the deposit.
+
+Loading
+-------
+
+When the check task finished, it scheduled a load task, implemented by
+:class:`swh.loader.package.deposit.loader.DepositLoader`.
+
+It is part of the ``swh.loader.package`` package instead of ``swh-deposit``,
+because its design is close to other :ref:`package loaders <swh-loader-core>`:
+
+1. fetch a tarball
+2. extract it
+3. use :mod:`swh.model.from_disk` to build SWH objects from it
+4. load these objects in :ref:`swh-storage <swh-storage>`
+
+The only difference in this process is fetching the tarball from the deposit server,
+instead of external repositories.
+This tarball is returned by :class:`swh.deposit.api.private.deposit_read`,
+which creates it by aggregating all archives sent by the client (usually
+only one, but the SWORD protocol allows more).
+
+Finally, when it is done, the loader updates the deposit status via the deposit API.
+
+Graphically:
+
+.. figure:: ../images/deposit-workflow-loading.svg
+ :alt:
+
diff --git a/docs/specs/protocol-reference.rst b/docs/specs/protocol-reference.rst
index 9deecb9a..9f9255dd 100644
--- a/docs/specs/protocol-reference.rst
+++ b/docs/specs/protocol-reference.rst
@@ -1,287 +1,287 @@
.. _deposit-protocol:
Protocol reference
==================
The swh-deposit protocol is an extension SWORDv2_ protocol, and the
swh-deposit client and server should work with any other SWORDv2-compliant
implementation which provides some :ref:`mandatory attributes <mandatory-attributes>`
However, we define some extensions by the means of extra tags in the Atom
entries, that should be used when interacting with the server to use it optimally.
This means the swh-deposit server should work with a generic SWORDv2 client, but
works much better with these extensions.
All these tags are in the ``https://www.softwareheritage.org/schema/2018/deposit``
XML namespace, denoted using the ``swhdeposit`` prefix in this section.
Origin creation with the ``<swhdeposit:create_origin>`` tag
-----------------------------------------------------------
Motivation
^^^^^^^^^^
This is the main extension we define.
This tag is used after a deposit is completed, to load it in the Software Heritage
archive.
The SWH archive references source code repositories by an URI, called the
:term:`origin` URL.
This URI is clearly defined when SWH pulls source code from such a repository;
but not for the push approach used by SWORD, as SWORD clients do not intrinsically
have an URL.
Usage
^^^^^
Instead, clients are expected to provide the origin URL themselves, by adding
a tag in the Atom entry they submit to the server, like this:
.. code:: xml
<atom:entry xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:swh="https://www.softwareheritage.org/schema/2018/deposit">
<!-- ... -->
<swh:deposit>
<swh:create_origin>
<swh:origin url="https://example.org/b063bf3a-e98e-40a0-b918-3e42b06011ba" />
</swh:create_origin>
</swh:deposit>
<!-- ... -->
</atom:entry>
This will create an origin in the Software Heritage archive, that will point to
the source code artifacts of this deposit.
Semantics of origin URLs
^^^^^^^^^^^^^^^^^^^^^^^^
Origin URLs must be unique to an origin, ie. to a software project.
The exact definition of a "software project" is left to the clients of the deposit.
They should be designed so that future releases of the same software will have
the same origin URL.
As a guideline, consider that every GitHub/GitLab project is an origin,
and every package in Debian/NPM/PyPI is also an origin.
While origin URLs are not required to resolve to a source code artifact,
we recommend they point to a public resource describing the software project,
including a link to download its source code.
This is not a technical requirement, but it improves discoverability.
Clients may not submit arbitrary URLs; the server will check the URLs they submit
belongs a "namespace" they own, known as the ``provider_url`` of the client.
For example, if a client has their ``provider_url`` set to ``https://example.org/foo/``
they will not be able to submit deposits to origins whose URL starts with
``https://example.org/foo/``.
Fallbacks
^^^^^^^^^
If the ``<swhdeposit:create_origin>`` is not provided (either because they are generic
SWORDv2 implementations or old implementations of an swh-deposit client), the server
falls back to creating one based on the ``provider_url`` and the ``Slug`` header
(as defined in the AtomPub_ specification) by concatenating them.
If the ``Slug`` header is missing, the server generates one randomly.
This fallback is provided for compliance with SWORDv2_ clients, but we do not
recommend relying on it, as it usually creates origins URL that are not meaningful.
Adding releases to an origin, with the ``<swhdeposit:add_to_origin>`` tag
-------------------------------------------------------------------------
When depositing a source code artifact for an origin (ie. software project) that
was already deposited before, clients should not use ``<swhdeposit:create_origin>``,
as the origin was already created by the original deposit; and
``<swhdeposit:add_to_origin>`` should be used instead.
It is used very similarly to ``<swhdeposit:create_origin>``:
.. code:: xml
<atom:entry xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:swh="https://www.softwareheritage.org/schema/2018/deposit">
<!-- ... -->
<swh:deposit>
<swh:add_to_origin>
<swh:origin url="https://example.org/~user/repo" />
</swh:add_to_origin>
</swh:deposit>
<!-- ... -->
</atom:entry>
This will create a new :term:`revision` object in the Software Heritage archive,
with the last deposit on this origin as its parent revision,
and reference it from the origin.
If the origin does not exist, it will error.
Metadata
--------
Format
^^^^^^
While the SWORDv2 specification recommends the use of DublinCore_,
we prefer the CodeMeta_ vocabulary, as we already use it in other components
of Software Heritage.
While CodeMeta is designed for use in JSON-LD, it is easy to reuse its vocabulary
and embed it in an XML document, in three steps:
1. use the JSON-LD compact representation of the CodeMeta document
2. replace ``@context`` declarations with XML namespaces
3. unfold JSON lists to sibling XML subtrees
For example, this CodeMeta document:
.. code:: json
{
"@context": "https://doi.org/10.5063/SCHEMA/CODEMETA-2.0",
"name": "My Software",
"author": [
{
"name": "Author 1",
"email": "foo@example.org"
},
{
- "name": Author 2"
+ "name": "Author 2"
}
]
}
becomes this XML document:
.. code:: xml
<?xml version="1.0"?>
<atom:entry xmlns:atom="http://www.w3.org/2005/Atom"
xmlns="https://doi.org/10.5063/SCHEMA/CODEMETA-2.0">
<name>My Software</name>
<author>
<name>Author 1</name>
<email>foo@example.org</email>
</author>
<author>
<name>Author 2</name>
</author>
</atom:entry>
Or, equivalently:
.. code:: xml
<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom"
xmlns:codemeta="https://doi.org/10.5063/SCHEMA/CODEMETA-2.0">
<codemeta:name>My Software</codemeta:name>
<codemeta:author>
<codemeta:name>Author 1</codemeta:name>
<codemeta:email>foo@example.org</codemeta:email>
</codemeta:author>
<codemeta:author>
<codemeta:name>Author 2</codemeta:name>
</codemeta:author>
</entry>
.. _mandatory-attributes:
Mandatory attributes
^^^^^^^^^^^^^^^^^^^^
All deposits must include:
* an ``<atom:author>`` tag with an ``<atom:name>`` and ``<atom:email>``, and
* either ``<atom:name>`` or ``<atom:title>``
We also highly recommend their CodeMeta equivalent, and any other relevant
metadata, but this is not enforced.
-.. _metatadata-only-deposit:
+.. _metadata-only-deposit:
Metadata-only deposit
---------------------
The swh-deposit server can also be without a source code artifact, but only
to provide metadata that describes an arbitrary origin or object in
Software Heritage; known as extrinsic metadata.
Unlike regular deposits, there are no restricting on URL prefixes,
so any client can provide metadata on any origin; and no restrictions on which
objects can be described.
This is done by simply omitting the binary file deposit request of
a regular SWORDv2 deposit, and including information on which object the metadata
describes, by adding a ``<swhdeposit:reference>`` tag in the Atom document.
To describe an origin:
.. code:: xml
<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom"
xmlns:swh="https://www.softwareheritage.org/schema/2018/deposit">
<!-- ... -->
<swh:deposit>
<swh:reference>
<swh:origin url='https://example.org/~user/repo'/>
</swh:reference>
</swh:deposit>
<!-- ... -->
</entry>
And to describe an object:
.. code:: xml
<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom"
xmlns:swh="https://www.softwareheritage.org/schema/2018/deposit">
<!-- ... -->
<swh:deposit>
<swh:reference>
<swh:object swhid="swh:1:dir:31b5c8cc985d190b5a7ef4878128ebfdc2358f49" />
</swh:reference>
</swh:deposit>
<!-- ... -->
</entry>
For details on the semantics, see the
:ref:`metadata deposit specification <spec-metadata-deposit>`
Schema
------
Here is an XML schema to summarize the syntax described in this document:
.. literalinclude:: swh.xsd
:language: xml
.. _SWORDv2: http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html
.. _AtomPub: https://tools.ietf.org/html/rfc5023
.. _DublinCore: https://www.dublincore.org/
.. _CodeMeta: https://codemeta.github.io/
diff --git a/requirements-swh-server.txt b/requirements-swh-server.txt
index 3ab50967..197a878f 100644
--- a/requirements-swh-server.txt
+++ b/requirements-swh-server.txt
@@ -1,5 +1,5 @@
swh.core[http] >= 0.4
swh.loader.core >= 0.0.71
swh.scheduler >= 0.7.0
swh.model >= 0.3.8
-swh.auth[django] >= 0.3.8
+swh.auth[django] >= 0.5.3
diff --git a/swh.deposit.egg-info/PKG-INFO b/swh.deposit.egg-info/PKG-INFO
index 57b1e507..e37e6f52 100644
--- a/swh.deposit.egg-info/PKG-INFO
+++ b/swh.deposit.egg-info/PKG-INFO
@@ -1,94 +1,94 @@
Metadata-Version: 2.1
Name: swh.deposit
-Version: 0.13.5
+Version: 0.13.6
Summary: Software Heritage Deposit Server
Home-page: https://forge.softwareheritage.org/source/swh-deposit/
Author: Software Heritage developers
Author-email: swh-devel@inria.fr
License: UNKNOWN
Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
Project-URL: Funding, https://www.softwareheritage.org/donate
Project-URL: Source, https://forge.softwareheritage.org/source/swh-deposit
Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-deposit/
Description: Software Heritage - Deposit
===========================
Simple Web-Service Offering Repository Deposit (S.W.O.R.D) is an interoperability
standard for digital file deposit.
This repository is both the `SWORD v2`_ Server and a deposit command-line client
implementations.
This implementation allows interaction between a client (a repository) and a server (SWH
repository) to deposit software source code archives and associated metadata.
Description
-----------
Most of the software source code artifacts present in the SWH Archive are gathered by
the mean of :term:`loader <loader>` workers run by the SWH project from sourve code
origins identified by :term:`lister <lister>` workers. This is a pull mechanism: it's
the responsibility of the SWH project to gather and collect source code artifacts that
way.
Alternatively, SWH allows its partners to push source code artifacts and metadata
directly into the Archive with a push-based mechanism. By using this possibility
different actors, holding software artifacts or metadata, can preserve their assets
without having to pass through an intermediate collaborative development platform, which
is already harvested by SWH (e.g GitHub, Gitlab, etc.).
This mechanism is the `deposit`.
The main idea is the deposit is an authenticated access to an API allowing the user to
provide source code artifacts -- with metadata -- to be ingested in the SWH Archive. The
result of that is a :ref:`SWHID <persistent-identifiers>` that can be used to uniquely
and persistently identify that very piece of source code.
This unique identifier can then be used to `reference the source code
<https://hal.archives-ouvertes.fr/hal-02446202>`_ (e.g. in a `scientific paper
<https://www.softwareheritage.org/2020/05/26/citing-software-with-style/>`_) and
retrieve it using the :ref:`vault <swh-vault>` feature of the SWH Archive platform.
The differences between a piece of code uploaded using the deposit rather than simply
asking SWH to archive a repository using the `save code now
<https://archive.softwareheritage.org/save/>`_ feature are:
- a deposited artifact is provided from one of the SWH partners which is regarded as a
trusted authority,
- a deposited artifact requires metadata properties describing the source code artifact,
- a deposited artifact has a codemeta_ metadata entry attached to it,
- a deposited artifact has the same visibility on the SWH Archive than a collected
repository,
- a deposited artifact can be searched with its provided url property on the SWH
Archive,
- the deposit API uses the `SWORD v2`_ API, thus requires some tooling to send deposits
to SWH. These tools are provided with this repository.
See the :ref:`deposit-user-manual` page for more details on how to use the deposit client
command line tools to push a deposit in the SWH Archive.
See the :ref:`deposit-api-specifications` reference pages of the SWORDv2 API implementation
in `swh.deposit` if you want to do upload deposits using HTTP requests.
- Read the :ref:`metadata` chapter to get more details on what metadata are supported when
- doing a deposit.
+ Read the :ref:`deposit-metadata` chapter to get more details on what metadata
+ are supported when doing a deposit.
See :ref:`swh-deposit-dev-env` if you want to hack the code of the `swh.deposit` module.
See :ref:`swh-deposit-prod-env` if you want to deploy your own copy of the
`swh.deposit` stack.
.. _codemeta: https://codemeta.github.io/
.. _`SWORD v2`: http://swordapp.org/sword-v2/
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 5 - Production/Stable
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Provides-Extra: testing
Provides-Extra: server
diff --git a/swh.deposit.egg-info/SOURCES.txt b/swh.deposit.egg-info/SOURCES.txt
index cd7550bd..a1d5a9e2 100644
--- a/swh.deposit.egg-info/SOURCES.txt
+++ b/swh.deposit.egg-info/SOURCES.txt
@@ -1,273 +1,277 @@
.gitignore
.pre-commit-config.yaml
AUTHORS
CODE_OF_CONDUCT.md
CONTRIBUTORS
LICENSE
MANIFEST.in
Makefile
Makefile.local
README.rst
conftest.py
mypy.ini
pyproject.toml
pytest.ini
requirements-server.txt
requirements-swh-server.txt
requirements-swh.txt
requirements-test.txt
requirements.txt
setup.cfg
setup.py
tox.ini
bin/Makefile
bin/content.sh
bin/create_deposit.sh
bin/create_deposit_atom.sh
bin/create_deposit_with_metadata.sh
bin/default-setup
bin/download-deposit-archive.sh
bin/home.sh
bin/replace-deposit-archive.sh
bin/service-document.sh
bin/status.sh
bin/update-deposit-with-another-archive.sh
bin/update-status.sh
docs/.gitignore
docs/Makefile
docs/README.rst
docs/cli.rst
docs/conf.py
docs/index.rst
docs/metadata.rst
docs/spec-api.rst
docs/user-manual.rst
docs/_static/.placeholder
docs/_templates/.placeholder
docs/api/api-documentation.rst
docs/api/index.rst
docs/api/metadata.rst
docs/api/use-cases.rst
docs/api/user-manual.rst
docs/endpoints/collection.rst
docs/endpoints/content.rst
docs/endpoints/service-document.rst
docs/endpoints/status.rst
docs/endpoints/update-media.rst
docs/endpoints/update-metadata.rst
docs/images/.gitignore
docs/images/Makefile
docs/images/deposit-authentication-basic.uml
docs/images/deposit-authentication-keycloak.uml
docs/images/deposit-create-chart.uml
docs/images/deposit-delete-chart.uml
docs/images/deposit-update-chart.uml
+docs/images/deposit-workflow-checking.uml
+docs/images/deposit-workflow-loading.uml
+docs/images/deposit-workflow-reception.uml
docs/images/status.uml
docs/internals/authentication.rst
docs/internals/dev-environment.rst
docs/internals/index.rst
+docs/internals/loading-workflow.rst
docs/internals/prod-environment.rst
docs/specs/blueprint.rst
docs/specs/index.rst
docs/specs/metadata_example.xml
docs/specs/protocol-reference.rst
docs/specs/spec-loading.rst
docs/specs/spec-meta-deposit.rst
docs/specs/swh.xsd
resources/deposit/server.yml
swh/__init__.py
swh.deposit.egg-info/PKG-INFO
swh.deposit.egg-info/SOURCES.txt
swh.deposit.egg-info/dependency_links.txt
swh.deposit.egg-info/entry_points.txt
swh.deposit.egg-info/requires.txt
swh.deposit.egg-info/top_level.txt
swh/deposit/__init__.py
swh/deposit/apps.py
swh/deposit/auth.py
swh/deposit/client.py
swh/deposit/config.py
swh/deposit/errors.py
swh/deposit/exception.py
swh/deposit/gunicorn_config.py
swh/deposit/manage.py
swh/deposit/models.py
swh/deposit/parsers.py
swh/deposit/py.typed
swh/deposit/urls.py
swh/deposit/utils.py
swh/deposit/api/__init__.py
swh/deposit/api/checks.py
swh/deposit/api/collection.py
swh/deposit/api/common.py
swh/deposit/api/content.py
swh/deposit/api/converters.py
swh/deposit/api/edit.py
swh/deposit/api/edit_media.py
swh/deposit/api/service_document.py
swh/deposit/api/state.py
swh/deposit/api/sword_edit.py
swh/deposit/api/urls.py
swh/deposit/api/private/__init__.py
swh/deposit/api/private/deposit_check.py
swh/deposit/api/private/deposit_list.py
swh/deposit/api/private/deposit_read.py
swh/deposit/api/private/deposit_update_status.py
swh/deposit/api/private/urls.py
swh/deposit/cli/__init__.py
swh/deposit/cli/admin.py
swh/deposit/cli/client.py
swh/deposit/fixtures/__init__.py
swh/deposit/fixtures/deposit_data.yaml
swh/deposit/loader/__init__.py
swh/deposit/loader/checker.py
swh/deposit/loader/tasks.py
swh/deposit/migrations/0001_initial.py
swh/deposit/migrations/0002_depositrequest_archive.py
swh/deposit/migrations/0003_temporaryarchive.py
swh/deposit/migrations/0004_delete_temporaryarchive.py
swh/deposit/migrations/0005_auto_20171019_1436.py
swh/deposit/migrations/0006_depositclient_url.py
swh/deposit/migrations/0007_auto_20171129_1609.py
swh/deposit/migrations/0008_auto_20171130_1513.py
swh/deposit/migrations/0009_deposit_parent.py
swh/deposit/migrations/0010_auto_20180110_0953.py
swh/deposit/migrations/0011_auto_20180115_1510.py
swh/deposit/migrations/0012_deposit_status_detail.py
swh/deposit/migrations/0013_depositrequest_raw_metadata.py
swh/deposit/migrations/0014_auto_20180720_1221.py
swh/deposit/migrations/0015_depositrequest_typemigration.py
swh/deposit/migrations/0016_auto_20190507_1408.py
swh/deposit/migrations/0017_auto_20190925_0906.py
swh/deposit/migrations/0018_migrate_swhids.py
swh/deposit/migrations/0019_auto_20200519_1035.py
swh/deposit/migrations/0020_auto_20200929_0855.py
swh/deposit/migrations/0021_deposit_origin_url_20201124_1438.py
swh/deposit/migrations/__init__.py
swh/deposit/settings/__init__.py
swh/deposit/settings/common.py
swh/deposit/settings/development.py
swh/deposit/settings/production.py
swh/deposit/settings/testing.py
swh/deposit/static/robots.txt
swh/deposit/static/css/bootstrap-responsive.min.css
swh/deposit/static/css/style.css
swh/deposit/static/img/arrow-up-small.png
swh/deposit/static/img/swh-logo-deposit.png
swh/deposit/static/img/swh-logo-deposit.svg
swh/deposit/static/img/icons/swh-logo-32x32.png
swh/deposit/static/img/icons/swh-logo-deposit-180x180.png
swh/deposit/static/img/icons/swh-logo-deposit-192x192.png
swh/deposit/static/img/icons/swh-logo-deposit-270x270.png
swh/deposit/templates/__init__.py
swh/deposit/templates/api.html
swh/deposit/templates/homepage.html
swh/deposit/templates/layout.html
swh/deposit/templates/deposit/__init__.py
swh/deposit/templates/deposit/content.xml
swh/deposit/templates/deposit/deposit_info.xml
swh/deposit/templates/deposit/deposit_receipt.xml
swh/deposit/templates/deposit/error.xml
swh/deposit/templates/deposit/service_document.xml
swh/deposit/templates/deposit/state.xml
swh/deposit/templates/rest_framework/api.html
swh/deposit/tests/__init__.py
swh/deposit/tests/common.py
swh/deposit/tests/conftest.py
swh/deposit/tests/test_backend.py
swh/deposit/tests/test_common.py
swh/deposit/tests/test_gunicorn_config.py
swh/deposit/tests/test_init.py
swh/deposit/tests/test_utils.py
swh/deposit/tests/api/__init__.py
swh/deposit/tests/api/conftest.py
swh/deposit/tests/api/test_basic_auth.py
swh/deposit/tests/api/test_checks.py
swh/deposit/tests/api/test_collection.py
swh/deposit/tests/api/test_collection_add_to_origin.py
swh/deposit/tests/api/test_collection_post_atom.py
swh/deposit/tests/api/test_collection_post_binary.py
swh/deposit/tests/api/test_collection_post_multipart.py
swh/deposit/tests/api/test_collection_reuse_slug.py
swh/deposit/tests/api/test_converters.py
swh/deposit/tests/api/test_delete.py
swh/deposit/tests/api/test_deposit_list.py
swh/deposit/tests/api/test_deposit_private_check.py
swh/deposit/tests/api/test_deposit_private_read_archive.py
swh/deposit/tests/api/test_deposit_private_read_metadata.py
swh/deposit/tests/api/test_deposit_private_update_status.py
swh/deposit/tests/api/test_deposit_schedule.py
swh/deposit/tests/api/test_deposit_state.py
swh/deposit/tests/api/test_deposit_update.py
swh/deposit/tests/api/test_deposit_update_atom.py
swh/deposit/tests/api/test_deposit_update_binary.py
swh/deposit/tests/api/test_exception.py
swh/deposit/tests/api/test_get_file.py
swh/deposit/tests/api/test_keycloak_auth.py
swh/deposit/tests/api/test_parsers.py
swh/deposit/tests/api/test_service_document.py
swh/deposit/tests/cli/__init__.py
swh/deposit/tests/cli/conftest.py
swh/deposit/tests/cli/test_admin.py
swh/deposit/tests/cli/test_client.py
swh/deposit/tests/data/archives/single-artifact-package.tar.gz
swh/deposit/tests/data/atom/codemeta-sample.xml
swh/deposit/tests/data/atom/entry-data-badly-formatted.xml
swh/deposit/tests/data/atom/entry-data-deposit-binary.xml
swh/deposit/tests/data/atom/entry-data-empty-body.xml
swh/deposit/tests/data/atom/entry-data-fail-metadata-functional-checks.xml
swh/deposit/tests/data/atom/entry-data-ko.xml
swh/deposit/tests/data/atom/entry-data-minimal.xml
swh/deposit/tests/data/atom/entry-data-no-origin-url.xml
swh/deposit/tests/data/atom/entry-data-parsing-error-prone.xml
swh/deposit/tests/data/atom/entry-data-with-add-to-origin.xml
swh/deposit/tests/data/atom/entry-data-with-both-add-to-origin-and-external-id.xml
swh/deposit/tests/data/atom/entry-data-with-both-create-origin-and-add-to-origin.xml
swh/deposit/tests/data/atom/entry-data-with-origin-reference.xml
swh/deposit/tests/data/atom/entry-data-with-swhid-fail-metadata-functional-checks.xml
swh/deposit/tests/data/atom/entry-data-with-swhid.xml
swh/deposit/tests/data/atom/entry-data0.xml
swh/deposit/tests/data/atom/entry-data1.xml
swh/deposit/tests/data/atom/entry-data2.xml
swh/deposit/tests/data/atom/entry-data3.xml
swh/deposit/tests/data/atom/entry-only-create-origin.xml
swh/deposit/tests/data/atom/entry-update-in-place.xml
swh/deposit/tests/data/atom/error-cli.xml
swh/deposit/tests/data/atom/error-with-decimal.xml
swh/deposit/tests/data/atom/error-with-external-identifier-and-create-origin.xml
swh/deposit/tests/data/atom/error-with-external-identifier.xml
swh/deposit/tests/data/atom/error-with-reference-and-create-origin.xml
swh/deposit/tests/data/atom/metadata.xml
swh/deposit/tests/data/https_deposit.swh.test/1_servicedocument
swh/deposit/tests/data/https_deposit.swh.test/1_test
swh/deposit/tests/data/https_deposit.test.metadata/1_servicedocument
swh/deposit/tests/data/https_deposit.test.metadata/1_test
swh/deposit/tests/data/https_deposit.test.metadata/1_test_666_media
swh/deposit/tests/data/https_deposit.test.metadata/1_test_666_metadata
swh/deposit/tests/data/https_deposit.test.metadata/1_test_666_status
swh/deposit/tests/data/https_deposit.test.metadataonly/1_servicedocument
swh/deposit/tests/data/https_deposit.test.metadataonly/1_test
swh/deposit/tests/data/https_deposit.test.status/1_servicedocument
swh/deposit/tests/data/https_deposit.test.status/1_test_1033_status
swh/deposit/tests/data/https_deposit.test.updateswhid/1_servicedocument
swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_123_atom
swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_123_status
swh/deposit/tests/data/https_deposit.test.updateswhid/1_test_321_status
swh/deposit/tests/loader/__init__.py
swh/deposit/tests/loader/common.py
swh/deposit/tests/loader/conftest.py
swh/deposit/tests/loader/test_checker.py
swh/deposit/tests/loader/test_client.py
swh/deposit/tests/loader/test_tasks.py
swh/deposit/tests/loader/data/http_example.org/hello.json
swh/deposit/tests/loader/data/http_example.org/hello_you
swh/deposit/tests/loader/data/https_deposit.softwareheritage.org/1_private_test_1_check
swh/deposit/tests/loader/data/https_deposit.softwareheritage.org/1_private_test_2_check
swh/deposit/tests/loader/data/https_deposit.softwareheritage.org/1_private_test_999_meta
swh/deposit/tests/loader/data/https_deposit.softwareheritage.org/1_private_test_999_raw
swh/deposit/tests/loader/data/https_deposit.softwareheritage.org/1_private_test_999_update
swh/deposit/tests/loader/data/https_nowhere.org/1_private_test_1_check
swh/deposit/tests/loader/data/https_nowhere.org/1_private_test_1_metadata
swh/deposit/tests/loader/data/https_nowhere.org/1_private_test_1_raw
swh/deposit/tests_migration/__init__.py
swh/deposit/tests_migration/test_migrations.py
\ No newline at end of file
diff --git a/swh.deposit.egg-info/requires.txt b/swh.deposit.egg-info/requires.txt
index 09a6a420..abde88f1 100644
--- a/swh.deposit.egg-info/requires.txt
+++ b/swh.deposit.egg-info/requires.txt
@@ -1,36 +1,36 @@
click
xmltodict
iso8601
requests
swh.core[http]>=0.4
swh.model>=1.0.0
[server]
Django<3
djangorestframework
setuptools
swh.core[http]>=0.4
swh.loader.core>=0.0.71
swh.scheduler>=0.7.0
swh.model>=0.3.8
-swh.auth[django]>=0.3.8
+swh.auth[django]>=0.5.3
[testing]
pytest
pytest-django
pytest-mock
swh.scheduler[testing]
swh.loader.core[testing]
pytest-postgresql>=2.1.0
requests_mock
django-stubs
djangorestframework-stubs>=1.4
django-test-migrations
Django<3
djangorestframework
setuptools
swh.core[http]>=0.4
swh.loader.core>=0.0.71
swh.scheduler>=0.7.0
swh.model>=0.3.8
-swh.auth[django]>=0.3.8
+swh.auth[django]>=0.5.3
diff --git a/swh/deposit/apps.py b/swh/deposit/apps.py
index 2a60f2c6..72e1510c 100644
--- a/swh/deposit/apps.py
+++ b/swh/deposit/apps.py
@@ -1,10 +1,11 @@
# Copyright (C) 2017 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
from django.apps import AppConfig
class DepositConfig(AppConfig):
name = "swh.deposit"
+ label = "deposit"
diff --git a/swh/deposit/models.py b/swh/deposit/models.py
index cad6861f..28db7802 100644
--- a/swh/deposit/models.py
+++ b/swh/deposit/models.py
@@ -1,243 +1,248 @@
# Copyright (C) 2017-2021 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
# Generated from:
# cd swh_deposit && \
# python3 -m manage inspectdb
import datetime
from typing import Optional
from django.contrib.auth.models import User, UserManager
from django.contrib.postgres.fields import ArrayField, JSONField
from django.db import models
from django.utils.timezone import now
from swh.auth.django.models import OIDCUser
from .config import (
ARCHIVE_TYPE,
DEPOSIT_STATUS_DEPOSITED,
DEPOSIT_STATUS_LOAD_FAILURE,
DEPOSIT_STATUS_LOAD_SUCCESS,
DEPOSIT_STATUS_PARTIAL,
DEPOSIT_STATUS_REJECTED,
DEPOSIT_STATUS_VERIFIED,
METADATA_TYPE,
)
class Dbversion(models.Model):
"""Db version
"""
version = models.IntegerField(primary_key=True)
release = models.DateTimeField(default=now, null=True)
description = models.TextField(blank=True, null=True)
class Meta:
db_table = "dbversion"
+ app_label = "deposit"
def __str__(self):
return str(
{
"version": self.version,
"release": self.release,
"description": self.description,
}
)
"""Possible status"""
DEPOSIT_STATUS = [
(DEPOSIT_STATUS_PARTIAL, DEPOSIT_STATUS_PARTIAL),
("expired", "expired"),
(DEPOSIT_STATUS_DEPOSITED, DEPOSIT_STATUS_DEPOSITED),
(DEPOSIT_STATUS_VERIFIED, DEPOSIT_STATUS_VERIFIED),
(DEPOSIT_STATUS_REJECTED, DEPOSIT_STATUS_REJECTED),
("loading", "loading"),
(DEPOSIT_STATUS_LOAD_SUCCESS, DEPOSIT_STATUS_LOAD_SUCCESS),
(DEPOSIT_STATUS_LOAD_FAILURE, DEPOSIT_STATUS_LOAD_FAILURE),
]
"""Possible status and the detailed meaning."""
DEPOSIT_STATUS_DETAIL = {
DEPOSIT_STATUS_PARTIAL: "Deposit is partially received. To finalize it, "
"In-Progress header should be false",
"expired": "Deposit has been there too long and is now "
"deemed ready to be garbage collected",
DEPOSIT_STATUS_DEPOSITED: "Deposit is ready for additional checks "
"(tarball ok, metadata, etc...)",
DEPOSIT_STATUS_VERIFIED: "Deposit is fully received, checked, and "
"ready for loading",
DEPOSIT_STATUS_REJECTED: "Deposit failed the checks",
"loading": "Loading is ongoing on swh's side",
DEPOSIT_STATUS_LOAD_SUCCESS: "The deposit has been successfully "
"loaded into the Software Heritage archive",
DEPOSIT_STATUS_LOAD_FAILURE: "The deposit loading into the "
"Software Heritage archive failed",
}
class DepositClient(User):
"""Deposit client
"""
collections = ArrayField(models.IntegerField(), null=True)
objects = UserManager() # type: ignore
# this typing hint is due to a mypy/django-stubs limitation,
# see https://github.com/typeddjango/django-stubs/issues/174
provider_url = models.TextField(null=False)
domain = models.TextField(null=False)
oidc_user: Optional[OIDCUser] = None
class Meta:
db_table = "deposit_client"
+ app_label = "deposit"
def __str__(self):
return str(
{
"id": self.id,
"collections": self.collections,
"username": super().username,
"domain": self.domain,
"provider_url": self.provider_url,
}
)
class Deposit(models.Model):
"""Deposit reception table
"""
id = models.BigAutoField(primary_key=True)
# First deposit reception date
reception_date = models.DateTimeField(auto_now_add=True)
# Date when the deposit is deemed complete and ready for loading
complete_date = models.DateTimeField(null=True)
# collection concerned by the deposit
collection = models.ForeignKey("DepositCollection", models.DO_NOTHING)
# Deprecated: Deposit's external identifier
external_id = models.TextField(null=True)
# URL of the origin of this deposit, null if this is a metadata-only deposit
origin_url = models.TextField(null=True)
# Deposit client
client = models.ForeignKey("DepositClient", models.DO_NOTHING)
# SWH's loading result identifier
swhid = models.TextField(blank=True, null=True)
swhid_context = models.TextField(blank=True, null=True)
# Deposit's status regarding loading
status = models.TextField(choices=DEPOSIT_STATUS, default=DEPOSIT_STATUS_PARTIAL)
status_detail = JSONField(null=True)
# deposit can have one parent
parent = models.ForeignKey("self", on_delete=models.PROTECT, null=True)
check_task_id = models.TextField(
blank=True, null=True, verbose_name="Scheduler's associated checking task id"
)
load_task_id = models.TextField(
blank=True, null=True, verbose_name="Scheduler's associated loading task id"
)
class Meta:
db_table = "deposit"
+ app_label = "deposit"
def __str__(self):
d = {
"id": self.id,
"reception_date": self.reception_date,
"collection": self.collection.name,
"external_id": self.external_id,
"origin_url": self.origin_url,
"client": self.client.username,
"status": self.status,
}
if self.status in (DEPOSIT_STATUS_REJECTED):
d["status_detail"] = self.status_detail
return str(d)
def client_directory_path(instance: "DepositRequest", filename: str) -> str:
"""Callable to determine the upload archive path. This defaults to
MEDIA_ROOT/client_<user_id>/%Y%m%d-%H%M%S.%f/<filename>.
The format "%Y%m%d-%H%M%S.%f" is the reception date of the associated deposit
formatted using strftime.
Args:
instance: DepositRequest concerned by the upload
filename: Filename of the uploaded file
Returns:
The upload archive path.
"""
reception_date = instance.deposit.reception_date
assert isinstance(reception_date, datetime.datetime)
folder = reception_date.strftime("%Y%m%d-%H%M%S.%f")
return f"client_{instance.deposit.client.id}/{folder}/{filename}"
REQUEST_TYPES = [(ARCHIVE_TYPE, ARCHIVE_TYPE), (METADATA_TYPE, METADATA_TYPE)]
class DepositRequest(models.Model):
"""Deposit request associated to one deposit.
"""
id = models.BigAutoField(primary_key=True)
# Deposit concerned by the request
deposit = models.ForeignKey(Deposit, models.DO_NOTHING)
date = models.DateTimeField(auto_now_add=True)
# Deposit request information on the data to inject
# this can be null when type is 'archive'
metadata = JSONField(null=True)
raw_metadata = models.TextField(null=True)
# this can be null when type is 'metadata'
archive = models.FileField(null=True, upload_to=client_directory_path)
type = models.CharField(max_length=8, choices=REQUEST_TYPES, null=True)
class Meta:
db_table = "deposit_request"
+ app_label = "deposit"
def __str__(self):
meta = None
if self.metadata:
from json import dumps
meta = dumps(self.metadata)
archive_name = None
if self.archive:
archive_name = self.archive.name
return str(
{
"id": self.id,
"deposit": self.deposit,
"metadata": meta,
"archive": archive_name,
}
)
class DepositCollection(models.Model):
id = models.BigAutoField(primary_key=True)
# Human readable name for the collection type e.g HAL, arXiv, etc...
name = models.TextField()
class Meta:
db_table = "deposit_collection"
+ app_label = "deposit"
def __str__(self):
return str({"id": self.id, "name": self.name})
diff --git a/swh/deposit/tests/conftest.py b/swh/deposit/tests/conftest.py
index 7d4ad52f..93ddf572 100644
--- a/swh/deposit/tests/conftest.py
+++ b/swh/deposit/tests/conftest.py
@@ -1,608 +1,608 @@
# Copyright (C) 2019-2021 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
import base64
from copy import deepcopy
from functools import partial
from io import BytesIO
import os
import re
from typing import TYPE_CHECKING, Dict, Mapping
from django.test.utils import setup_databases # type: ignore
from django.urls import reverse_lazy as reverse
import psycopg2
from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT
import pytest
from rest_framework import status
from rest_framework.test import APIClient
import yaml
from swh.auth.pytest_plugin import keycloak_mock_factory
from swh.core.config import read
from swh.core.pytest_plugin import get_response_cb
from swh.deposit.auth import DEPOSIT_PERMISSION
from swh.deposit.config import (
COL_IRI,
DEPOSIT_STATUS_DEPOSITED,
DEPOSIT_STATUS_LOAD_FAILURE,
DEPOSIT_STATUS_LOAD_SUCCESS,
DEPOSIT_STATUS_PARTIAL,
DEPOSIT_STATUS_REJECTED,
DEPOSIT_STATUS_VERIFIED,
SE_IRI,
setup_django_for,
)
from swh.deposit.parsers import parse_xml
from swh.deposit.tests.common import (
create_arborescence_archive,
post_archive,
post_atom,
)
from swh.model.hashutil import hash_to_bytes
from swh.model.identifiers import CoreSWHID, ObjectType, QualifiedSWHID
from swh.scheduler import get_scheduler
if TYPE_CHECKING:
from swh.deposit.models import Deposit, DepositClient, DepositCollection
# mypy is asked to ignore the import statement above because setup_databases
# is not part of the d.t.utils.__all__ variable.
USERNAME = "test"
EMAIL = "test@example.org"
COLLECTION = "test"
TEST_USER = {
"username": USERNAME,
"password": "pass",
"email": EMAIL,
"provider_url": "https://hal-test.archives-ouvertes.fr/",
"domain": "archives-ouvertes.fr/",
"collection": {"name": COLLECTION},
}
USER_INFO = {
"name": USERNAME,
"email": EMAIL,
"email_verified": False,
"family_name": "",
"given_name": "",
"groups": [],
"preferred_username": USERNAME,
"sub": "ffffffff-bbbb-4444-aaaa-14f61e6b7200",
}
USERNAME2 = "test2"
EMAIL2 = "test@example.org"
COLLECTION2 = "another-collection"
TEST_USER2 = {
"username": USERNAME2,
"password": "",
"email": EMAIL2,
"provider_url": "https://hal-test.archives-ouvertes.example/",
"domain": "archives-ouvertes.example/",
"collection": {"name": COLLECTION2},
}
KEYCLOAK_SERVER_URL = "https://auth.swh.org/SWHTest"
KEYCLOAK_REALM_NAME = "SWHTest"
CLIENT_ID = "swh-deposit"
keycloak_mock_auth_success = keycloak_mock_factory(
server_url=KEYCLOAK_SERVER_URL,
realm_name=KEYCLOAK_REALM_NAME,
client_id=CLIENT_ID,
auth_success=True,
user_info=USER_INFO,
- user_permissions=[DEPOSIT_PERMISSION],
+ client_permissions=[DEPOSIT_PERMISSION],
)
keycloak_mock_auth_failure = keycloak_mock_factory(
server_url=KEYCLOAK_SERVER_URL,
realm_name=KEYCLOAK_REALM_NAME,
client_id=CLIENT_ID,
auth_success=False,
)
def pytest_configure():
setup_django_for("testing")
@pytest.fixture
def requests_mock_datadir(datadir, requests_mock_datadir):
"""Override default behavior to deal with put/post methods
"""
cb = partial(get_response_cb, datadir=datadir)
requests_mock_datadir.put(re.compile("https://"), body=cb)
requests_mock_datadir.post(re.compile("https://"), body=cb)
return requests_mock_datadir
@pytest.fixture
def common_deposit_config(swh_scheduler_config, swh_storage_backend_config):
return {
"max_upload_size": 500,
"extraction_dir": "/tmp/swh-deposit/test/extraction-dir",
"checks": False,
"scheduler": {"cls": "local", **swh_scheduler_config,},
"storage": swh_storage_backend_config,
"storage_metadata": swh_storage_backend_config,
"swh_authority_url": "http://deposit.softwareheritage.example/",
}
@pytest.fixture()
def deposit_config(common_deposit_config):
return {
**common_deposit_config,
"authentication_provider": "keycloak",
"keycloak": {
"server_url": KEYCLOAK_SERVER_URL,
"realm_name": KEYCLOAK_REALM_NAME,
},
}
@pytest.fixture()
def deposit_config_path(tmp_path, monkeypatch, deposit_config):
conf_path = os.path.join(tmp_path, "deposit.yml")
with open(conf_path, "w") as f:
f.write(yaml.dump(deposit_config))
monkeypatch.setenv("SWH_CONFIG_FILENAME", conf_path)
return conf_path
@pytest.fixture(autouse=True)
def deposit_autoconfig(deposit_config_path):
"""Enforce config for deposit classes inherited from APIConfig."""
cfg = read(deposit_config_path)
if "scheduler" in cfg:
# scheduler setup: require the check-deposit and load-deposit tasks
scheduler = get_scheduler(**cfg["scheduler"])
task_types = [
{
"type": "check-deposit",
"backend_name": "swh.deposit.loader.tasks.ChecksDepositTsk",
"description": "Check deposit metadata/archive before loading",
"num_retries": 3,
},
{
"type": "load-deposit",
"backend_name": "swh.loader.package.deposit.tasks.LoadDeposit",
"description": "Loading deposit archive into swh archive",
"num_retries": 3,
},
]
for task_type in task_types:
scheduler.create_task_type(task_type)
@pytest.fixture(scope="session")
def django_db_setup(request, django_db_blocker, postgresql_proc):
from django.conf import settings
settings.DATABASES["default"].update(
{
("ENGINE", "django.db.backends.postgresql"),
("NAME", "tests"),
("USER", postgresql_proc.user), # noqa
("HOST", postgresql_proc.host), # noqa
("PORT", postgresql_proc.port), # noqa
}
)
with django_db_blocker.unblock():
setup_databases(
verbosity=request.config.option.verbose, interactive=False, keepdb=False
)
def execute_sql(sql):
"""Execute sql to postgres db"""
with psycopg2.connect(database="postgres") as conn:
conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)
cur = conn.cursor()
cur.execute(sql)
@pytest.fixture(autouse=True, scope="session")
def swh_proxy():
"""Automatically inject this fixture in all tests to ensure no outside
connection takes place.
"""
os.environ["http_proxy"] = "http://localhost:999"
os.environ["https_proxy"] = "http://localhost:999"
def create_deposit_collection(collection_name: str):
"""Create a deposit collection with name collection_name
"""
from swh.deposit.models import DepositCollection
try:
collection = DepositCollection._default_manager.get(name=collection_name)
except DepositCollection.DoesNotExist:
collection = DepositCollection(name=collection_name)
collection.save()
return collection
def deposit_collection_factory(collection_name):
@pytest.fixture
def _deposit_collection(db, collection_name=collection_name):
return create_deposit_collection(collection_name)
return _deposit_collection
deposit_collection = deposit_collection_factory(COLLECTION)
deposit_another_collection = deposit_collection_factory(COLLECTION2)
def _create_deposit_user(
collection: "DepositCollection", user_data: Dict
) -> "DepositClient":
"""Create/Return the test_user "test"
For basic authentication, this will save a password.
This is not required for keycloak authentication scheme.
"""
from swh.deposit.models import DepositClient
user_data_d = deepcopy(user_data)
user_data_d.pop("collection", None)
passwd = user_data_d.pop("password", None)
user, _ = DepositClient.objects.get_or_create( # type: ignore
username=user_data_d["username"],
defaults={**user_data_d, "collections": [collection.id]},
)
if passwd:
user.set_password(passwd)
user.save()
return user
@pytest.fixture
def deposit_user(db, deposit_collection):
return _create_deposit_user(deposit_collection, TEST_USER)
@pytest.fixture
def deposit_another_user(db, deposit_another_collection):
return _create_deposit_user(deposit_another_collection, TEST_USER2)
@pytest.fixture
def anonymous_client():
"""Create an anonymous client (no credentials during queries to the deposit)
"""
return APIClient() # <- drf's client
def mock_keycloakopenidconnect(mocker, keycloak_mock):
"""Mock swh.deposit.auth.KeycloakOpenIDConnect to return the keycloak_mock
"""
mock = mocker.patch("swh.deposit.auth.KeycloakOpenIDConnect")
mock.from_configfile.return_value = keycloak_mock
return mock
@pytest.fixture
def mock_keycloakopenidconnect_ok(mocker, keycloak_mock_auth_success):
"""Mock keycloak so it always accepts connection for user with the right
permissions
"""
return mock_keycloakopenidconnect(mocker, keycloak_mock_auth_success)
@pytest.fixture
def mock_keycloakopenidconnect_ko(mocker, keycloak_mock_auth_failure):
"""Mock keycloak so it always refuses connections."""
return mock_keycloakopenidconnect(mocker, keycloak_mock_auth_failure)
def _create_authenticated_client(client, user, password=None):
"""Return a client whose credentials will be proposed to the deposit server.
This also patched the client instance to keep a reference on the associated
deposit_user.
"""
if not password:
password = "irrelevant-if-not-set"
_token = "%s:%s" % (user.username, password)
token = base64.b64encode(_token.encode("utf-8"))
authorization = "Basic %s" % token.decode("utf-8")
client.credentials(HTTP_AUTHORIZATION=authorization)
client.deposit_client = user
yield client
client.logout()
@pytest.fixture
def basic_authenticated_client(anonymous_client, deposit_user):
yield from _create_authenticated_client(
anonymous_client, deposit_user, password=TEST_USER["password"]
)
@pytest.fixture
def authenticated_client(mock_keycloakopenidconnect_ok, anonymous_client, deposit_user):
yield from _create_authenticated_client(anonymous_client, deposit_user)
@pytest.fixture
def unauthorized_client(mock_keycloakopenidconnect_ko, anonymous_client, deposit_user):
"""Create an unauthorized client (will see their authentication fail)
"""
yield from _create_authenticated_client(anonymous_client, deposit_user)
@pytest.fixture
def insufficient_perm_client(
mocker, keycloak_mock_auth_success, anonymous_client, deposit_user
):
"""keycloak accepts connection but client returned has no deposit permission, so access
is not allowed.
"""
- keycloak_mock_auth_success.user_permissions = []
+ keycloak_mock_auth_success.client_permissions = []
mock_keycloakopenidconnect(mocker, keycloak_mock_auth_success)
yield from _create_authenticated_client(anonymous_client, deposit_user)
@pytest.fixture
def sample_archive(tmp_path):
"""Returns a sample archive
"""
tmp_path = str(tmp_path) # pytest version limitation in previous version
archive = create_arborescence_archive(
tmp_path, "archive1", "file1", b"some content in file"
)
return archive
@pytest.fixture
def atom_dataset(datadir) -> Mapping[str, str]:
"""Compute the paths to atom files.
Returns:
Dict of atom name per content (bytes)
"""
atom_path = os.path.join(datadir, "atom")
data = {}
for filename in os.listdir(atom_path):
filepath = os.path.join(atom_path, filename)
with open(filepath, "rb") as f:
raw_content = f.read().decode("utf-8")
# Keep the filename without extension
atom_name = filename.split(".")[0]
data[atom_name] = raw_content
return data
def internal_create_deposit(
client: "DepositClient",
collection: "DepositCollection",
external_id: str,
status: str,
) -> "Deposit":
"""Create a deposit for a given collection with internal tool
"""
from swh.deposit.models import Deposit
deposit = Deposit(
client=client, external_id=external_id, status=status, collection=collection
)
deposit.save()
return deposit
def create_deposit(
client,
collection_name: str,
sample_archive,
external_id: str,
deposit_status=DEPOSIT_STATUS_DEPOSITED,
in_progress=False,
):
"""Create a skeleton shell deposit
"""
url = reverse(COL_IRI, args=[collection_name])
# when
response = post_archive(
client,
url,
sample_archive,
HTTP_SLUG=external_id,
HTTP_IN_PROGRESS=str(in_progress).lower(),
)
# then
assert response.status_code == status.HTTP_201_CREATED, response.content.decode()
from swh.deposit.models import Deposit
response_content = parse_xml(BytesIO(response.content))
deposit_id = response_content["swh:deposit_id"]
deposit = Deposit._default_manager.get(id=deposit_id)
if deposit.status != deposit_status:
deposit.status = deposit_status
deposit.save()
assert deposit.status == deposit_status
return deposit
def create_binary_deposit(
authenticated_client,
collection_name: str,
deposit_status: str = DEPOSIT_STATUS_DEPOSITED,
atom_dataset: Mapping[str, bytes] = {},
**kwargs,
):
"""Create a deposit with both metadata and archive set. Then alters its status
to `deposit_status`.
"""
deposit = create_deposit(
authenticated_client,
collection_name,
deposit_status=DEPOSIT_STATUS_PARTIAL,
**kwargs,
)
origin_url = deposit.client.provider_url + deposit.external_id
response = post_atom(
authenticated_client,
reverse(SE_IRI, args=[collection_name, deposit.id]),
data=atom_dataset["entry-data0"] % origin_url,
HTTP_IN_PROGRESS="true",
)
assert response.status_code == status.HTTP_201_CREATED
assert deposit.status == DEPOSIT_STATUS_PARTIAL
from swh.deposit.models import Deposit
deposit = Deposit._default_manager.get(pk=deposit.id)
assert deposit.status == deposit_status
return deposit
def deposit_factory(deposit_status=DEPOSIT_STATUS_DEPOSITED, in_progress=False):
"""Build deposit with a specific status
"""
@pytest.fixture()
def _deposit(
sample_archive,
deposit_collection,
authenticated_client,
deposit_status=deposit_status,
):
external_id = "external-id-%s" % deposit_status
return create_deposit(
authenticated_client,
deposit_collection.name,
sample_archive,
external_id=external_id,
deposit_status=deposit_status,
in_progress=in_progress,
)
return _deposit
deposited_deposit = deposit_factory()
rejected_deposit = deposit_factory(deposit_status=DEPOSIT_STATUS_REJECTED)
partial_deposit = deposit_factory(
deposit_status=DEPOSIT_STATUS_PARTIAL, in_progress=True
)
verified_deposit = deposit_factory(deposit_status=DEPOSIT_STATUS_VERIFIED)
completed_deposit = deposit_factory(deposit_status=DEPOSIT_STATUS_LOAD_SUCCESS)
failed_deposit = deposit_factory(deposit_status=DEPOSIT_STATUS_LOAD_FAILURE)
@pytest.fixture
def partial_deposit_with_metadata(
sample_archive, deposit_collection, authenticated_client, atom_dataset
):
"""Returns deposit with archive and metadata provided, status 'partial'
"""
return create_binary_deposit(
authenticated_client,
deposit_collection.name,
sample_archive=sample_archive,
external_id="external-id-partial",
in_progress=True,
deposit_status=DEPOSIT_STATUS_PARTIAL,
atom_dataset=atom_dataset,
)
@pytest.fixture
def partial_deposit_only_metadata(
deposit_collection, authenticated_client, atom_dataset
):
response = post_atom(
authenticated_client,
reverse(COL_IRI, args=[deposit_collection.name]),
data=atom_dataset["entry-data1"],
HTTP_SLUG="external-id-partial",
HTTP_IN_PROGRESS=True,
)
assert response.status_code == status.HTTP_201_CREATED
response_content = parse_xml(response.content)
deposit_id = response_content["swh:deposit_id"]
from swh.deposit.models import Deposit
deposit = Deposit._default_manager.get(pk=deposit_id)
assert deposit.status == DEPOSIT_STATUS_PARTIAL
return deposit
@pytest.fixture
def complete_deposit(sample_archive, deposit_collection, authenticated_client):
"""Returns a completed deposit (load success)
"""
deposit = create_deposit(
authenticated_client,
deposit_collection.name,
sample_archive,
external_id="external-id-complete",
deposit_status=DEPOSIT_STATUS_LOAD_SUCCESS,
)
origin = "https://hal.archives-ouvertes.fr/hal-01727745"
directory_id = "42a13fc721c8716ff695d0d62fc851d641f3a12b"
revision_id = hash_to_bytes("548b3c0a2bb43e1fca191e24b5803ff6b3bc7c10")
snapshot_id = hash_to_bytes("e5e82d064a9c3df7464223042e0c55d72ccff7f0")
deposit.swhid = f"swh:1:dir:{directory_id}"
deposit.swhid_context = str(
QualifiedSWHID(
object_type=ObjectType.DIRECTORY,
object_id=hash_to_bytes(directory_id),
origin=origin,
visit=CoreSWHID(object_type=ObjectType.SNAPSHOT, object_id=snapshot_id),
anchor=CoreSWHID(object_type=ObjectType.REVISION, object_id=revision_id),
path=b"/",
)
)
deposit.save()
return deposit
@pytest.fixture()
def tmp_path(tmp_path):
return str(tmp_path) # issue with oldstable's pytest version
diff --git a/swh/deposit/utils.py b/swh/deposit/utils.py
index 3482ff60..0bb94c86 100644
--- a/swh/deposit/utils.py
+++ b/swh/deposit/utils.py
@@ -1,234 +1,240 @@
# Copyright (C) 2018-2020 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
import logging
from types import GeneratorType
from typing import Any, Dict, Optional, Union
import iso8601
import xmltodict
from swh.model.exceptions import ValidationError
from swh.model.identifiers import (
ExtendedSWHID,
ObjectType,
QualifiedSWHID,
normalize_timestamp,
)
logger = logging.getLogger(__name__)
def parse_xml(stream, encoding="utf-8"):
namespaces = {
"http://www.w3.org/2005/Atom": "atom",
"http://www.w3.org/2007/app": "app",
"http://purl.org/dc/terms/": "dc",
"https://doi.org/10.5063/SCHEMA/CODEMETA-2.0": "codemeta",
"http://purl.org/net/sword/terms/": "sword",
"https://www.softwareheritage.org/schema/2018/deposit": "swh",
}
data = xmltodict.parse(
stream,
encoding=encoding,
namespaces=namespaces,
process_namespaces=True,
dict_constructor=dict,
)
if "atom:entry" in data:
data = data["atom:entry"]
return data
def merge(*dicts):
"""Given an iterator of dicts, merge them losing no information.
Args:
*dicts: arguments are all supposed to be dict to merge into one
Returns:
dict merged without losing information
"""
def _extend(existing_val, value):
"""Given an existing value and a value (as potential lists), merge
them together without repetition.
"""
if isinstance(value, (list, map, GeneratorType)):
vals = value
else:
vals = [value]
for v in vals:
if v in existing_val:
continue
existing_val.append(v)
return existing_val
d = {}
for data in dicts:
if not isinstance(data, dict):
raise ValueError("dicts is supposed to be a variable arguments of dict")
for key, value in data.items():
existing_val = d.get(key)
if not existing_val:
d[key] = value
continue
if isinstance(existing_val, (list, map, GeneratorType)):
new_val = _extend(existing_val, value)
elif isinstance(existing_val, dict):
if isinstance(value, dict):
new_val = merge(existing_val, value)
else:
new_val = _extend([existing_val], value)
else:
new_val = _extend([existing_val], value)
d[key] = new_val
return d
def normalize_date(date):
"""Normalize date fields as expected by swh workers.
If date is a list, elect arbitrarily the first element of that
list
If date is (then) a string, parse it through
dateutil.parser.parse to extract a datetime.
Then normalize it through
swh.model.identifiers.normalize_timestamp.
Returns
The swh date object
"""
if isinstance(date, list):
date = date[0]
if isinstance(date, str):
date = iso8601.parse_date(date)
return normalize_timestamp(date)
def compute_metadata_context(swhid_reference: QualifiedSWHID) -> Dict[str, Any]:
"""Given a SWHID object, determine the context as a dict.
"""
metadata_context: Dict[str, Any] = {"origin": None}
if swhid_reference.qualifiers():
metadata_context = {
"origin": swhid_reference.origin,
"path": swhid_reference.path,
}
snapshot = swhid_reference.visit
if snapshot:
metadata_context["snapshot"] = snapshot
anchor = swhid_reference.anchor
if anchor:
metadata_context[anchor.object_type.name.lower()] = anchor
return metadata_context
ALLOWED_QUALIFIERS_NODE_TYPE = (
ObjectType.SNAPSHOT,
ObjectType.REVISION,
ObjectType.RELEASE,
ObjectType.DIRECTORY,
)
def parse_swh_reference(metadata: Dict,) -> Optional[Union[QualifiedSWHID, str]]:
- """Parse swh reference within the metadata dict (or origin) reference if found, None
- otherwise.
+ """Parse swh reference within the metadata dict (or origin) reference if found,
+ None otherwise.
- <swh:deposit>
- <swh:reference>
- <swh:origin url='https://github.com/user/repo'/>
- </swh:reference>
- </swh:deposit>
+ .. code-block:: xml
+
+ <swh:deposit>
+ <swh:reference>
+ <swh:origin url='https://github.com/user/repo'/>
+ </swh:reference>
+ </swh:deposit>
or:
- <swh:deposit>
- <swh:reference>
- <swh:object swhid="swh:1:dir:31b5c8cc985d190b5a7ef4878128ebfdc2358f49;origin=https://hal.archives-ouvertes.fr/hal-01243573;visit=swh:1:snp:4fc1e36fca86b2070204bedd51106014a614f321;anchor=swh:1:rev:9c5de20cfb54682370a398fcc733e829903c8cba;path=/moranegg-AffectationRO-df7f68b/"
- />
- </swh:deposit>
+ .. code-block:: xml
+
+ <swh:deposit>
+ <swh:reference>
+ <swh:object swhid="swh:1:dir:31b5c8cc985d190b5a7ef4878128ebfdc2358f49;origin=https://hal.archives-ouvertes.fr/hal-01243573;visit=swh:1:snp:4fc1e36fca86b2070204bedd51106014a614f321;anchor=swh:1:rev:9c5de20cfb54682370a398fcc733e829903c8cba;path=/moranegg-AffectationRO-df7f68b/" />
+ </swh:deposit>
+
+ Args:
+ metadata: result of parsing an Atom document with :func:`parse_xml`
Raises:
ValidationError in case the swhid referenced (if any) is invalid
Returns:
Either swhid or origin reference if any. None otherwise.
""" # noqa
swh_deposit = metadata.get("swh:deposit")
if not swh_deposit:
return None
swh_reference = swh_deposit.get("swh:reference")
if not swh_reference:
return None
swh_origin = swh_reference.get("swh:origin")
if swh_origin:
url = swh_origin.get("@url")
if url:
return url
swh_object = swh_reference.get("swh:object")
if not swh_object:
return None
swhid = swh_object.get("@swhid")
if not swhid:
return None
swhid_reference = QualifiedSWHID.from_string(swhid)
if swhid_reference.qualifiers():
anchor = swhid_reference.anchor
if anchor:
if anchor.object_type not in ALLOWED_QUALIFIERS_NODE_TYPE:
error_msg = (
"anchor qualifier should be a core SWHID with type one of "
f"{', '.join(t.name.lower() for t in ALLOWED_QUALIFIERS_NODE_TYPE)}"
)
raise ValidationError(error_msg)
visit = swhid_reference.visit
if visit:
if visit.object_type != ObjectType.SNAPSHOT:
raise ValidationError(
f"visit qualifier should be a core SWHID with type snp, "
f"not {visit.object_type.value}"
)
if (
visit
and anchor
and visit.object_type == ObjectType.SNAPSHOT
and anchor.object_type == ObjectType.SNAPSHOT
):
logger.warn(
"SWHID use of both anchor and visit targeting "
f"a snapshot: {swhid_reference}"
)
raise ValidationError(
"'anchor=swh:1:snp:' is not supported when 'visit' is also provided."
)
return swhid_reference
def extended_swhid_from_qualified(swhid: QualifiedSWHID) -> ExtendedSWHID:
"""Used to get the target of a metadata object from a <swh:reference>,
as the latter uses a QualifiedSWHID."""
return ExtendedSWHID.from_string(str(swhid).split(";")[0])
diff --git a/tox.ini b/tox.ini
index 12c11f50..c27673b6 100644
--- a/tox.ini
+++ b/tox.ini
@@ -1,43 +1,81 @@
[tox]
envlist=flake8,mypy,py3-django2
[testenv]
extras =
testing
deps =
# the dependency below is needed for now as a workaround for
# https://github.com/pypa/pip/issues/6239
swh.core[http] >= 0.3
swh.scheduler[testing] >= 0.5.0
dev: pdbpp
pytest-cov
django2: Django>=2,<3
commands =
pytest \
!dev: --cov {envsitepackagesdir}/swh/deposit --cov-branch \
{envsitepackagesdir}/swh/deposit \
{posargs}
[testenv:black]
skip_install = true
deps =
black==19.10b0
commands =
{envpython} -m black --check swh
[testenv:flake8]
skip_install = true
deps =
flake8
commands =
{envpython} -m flake8 \
--exclude=.tox,.git,__pycache__,.tox,.eggs,*.egg,swh/deposit/migrations
[testenv:mypy]
setenv = DJANGO_SETTINGS_MODULE=swh.deposit.settings.testing
extras =
testing
deps =
mypy
commands =
mypy swh
+
+# build documentation outside swh-environment using the current
+# git HEAD of swh-docs, is executed on CI for each diff to prevent
+# breaking doc build
+[testenv:sphinx]
+whitelist_externals = make
+usedevelop = true
+extras =
+ testing
+deps =
+ # fetch and install swh-docs in develop mode
+ -e git+https://forge.softwareheritage.org/source/swh-docs#egg=swh.docs
+
+setenv =
+ SWH_PACKAGE_DOC_TOX_BUILD = 1
+ # turn warnings into errors
+ SPHINXOPTS = -W
+commands =
+ make -I ../.tox/sphinx/src/swh-docs/swh/ -C docs
+
+
+# build documentation only inside swh-environment using local state
+# of swh-docs package
+[testenv:sphinx-dev]
+whitelist_externals = make
+usedevelop = true
+extras =
+ testing
+deps =
+ # install swh-docs in develop mode
+ -e ../swh-docs
+
+setenv =
+ SWH_PACKAGE_DOC_TOX_BUILD = 1
+ # turn warnings into errors
+ SPHINXOPTS = -W
+commands =
+ make -I ../.tox/sphinx-dev/src/swh-docs/swh/ -C docs
\ No newline at end of file
File Metadata
Details
Attached
Mime Type
text/x-diff
Expires
Fri, Jul 4, 4:55 PM (3 w, 2 d ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3452629
Attached To
rDDEP Push deposit
Event Timeline
Log In to Comment