diff --git a/docs/endpoints/status.rst b/docs/endpoints/status.rst index ed17ac87..c6252279 100644 --- a/docs/endpoints/status.rst +++ b/docs/endpoints/status.rst @@ -1,76 +1,76 @@ Retrieve status ^^^^^^^^^^^^^^^^ .. http:get:: /1/// Returns deposit's status. The different statuses: - **partial**: multipart deposit is still ongoing - **deposited**: deposit completed, ready for checks - **rejected**: deposit failed the checks - **verified**: content and metadata verified, ready for loading - **loading**: loading in-progress - **done**: loading completed successfully - **failed**: the deposit loading has failed Also known as STATE-IRI :param text : the client's credentials :statuscode 201: with the deposit's status :statuscode 401: Unauthorized :statuscode 404: access to an unknown deposit Rejected deposit ~~~~~~~~~~~~~~~~ It so happens that deposit could be rejected. In that case, the `deposit_status_detail` entry will explain failed checks. Many reasons are possibles, here are some: - Deposit without software archive (main goal of the deposit is to deposit software source code) - Deposit with malformed software archive (i.e archive within archive) - Deposit with invalid software archive (corrupted archive, although, this one should happen during upload and not during checks) - Deposit with unsupported archive format - Deposit with missing metadata Sample response ~~~~~~~~~~~~~~~ Successful deposit: .. code:: xml 160 done The deposit has been successfully loaded into the Software Heritage archive swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9 - swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9;origin=https://forge.softwareheritage.org/source/jesuisgpl/ + swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9;origin=https://forge.softwareheritage.org/source/jesuisgpl/;visit=swh:1:snp:68c0d26104d47e278dd6be07ed61fafb561d0d20;anchor=swh:1:rev:e76ea49c9ffbb7f73611087ba6e999b19e5d71eb;path=/ swh:1:rev:e76ea49c9ffbb7f73611087ba6e999b19e5d71eb swh:1:rev:e76ea49c9ffbb7f73611087ba6e999b19e5d71eb;origin=https://forge.softwareheritage.org/source/jesuisgpl/ Rejected deposit: .. code:: xml 148 rejected - At least one url field must be compatible with the client's domain name (codemeta:url) diff --git a/docs/getting-started.rst b/docs/getting-started.rst index 28345119..864285f0 100644 --- a/docs/getting-started.rst +++ b/docs/getting-started.rst @@ -1,287 +1,287 @@ Getting Started =============== This is a guide for how to prepare and push a software deposit with the `swh deposit` commands. The API is rooted at https://deposit.softwareheritage.org/1. For more details, see the `main documentation <./index.html>`__. Requirements ------------ You need to be referenced on SWH's client list to have: * credentials (needed for the basic authentication step) - in this document we reference ```` as the client's name and ```` as its associated authentication password. * an associated collection_. .. _collection: https://bitworking.org/projects/atom/rfc5023#rfc.section.8.3.3 `Contact us for more information. `__ Prepare a deposit ----------------- * compress the files in a supported archive format: - zip: common zip archive (no multi-disk zip files). - tar: tar archive without compression or optionally any of the following compression algorithm gzip (`.tar.gz`, `.tgz`), bzip2 (`.tar.bz2`) , or lzma (`.tar.lzma`) * (Optional) prepare a metadata file (more details :ref:`deposit-metadata`): Push deposit ------------ You can push a deposit with: * a single deposit (archive + metadata): The user posts in one query a software source code archive and associated metadata. The deposit is directly marked with status ``deposited``. * a multisteps deposit: 1. Create an incomplete deposit (marked with status ``partial``) 2. Add data to a deposit (in multiple requests if needed) 3. Finalize deposit (the status becomes ``deposited``) Single deposit ^^^^^^^^^^^^^^ Once the files are ready for deposit, we want to do the actual deposit in one shot, sending exactly one POST query: * 1 archive (content-type ``application/zip`` or ``application/x-tar``) * 1 metadata file in atom xml format (``content-type: application/atom+xml;type=entry``) For this, we need to provide the: * arguments: ``--username 'name' --password 'pass'`` as credentials * archive's path (example: ``--archive path/to/archive-name.tgz``) * software's name (optional if a metadata filepath is specified and the artifact's name is included in the metadata file). * author's name (optional if a metadata filepath is specified and the authors are included in the metadata file). This can be specified multiple times in case of multiple authors. * (optionally) metadata file's path ``--metadata path/to/file.metadata.xml``. * (optionally) ``--slug 'your-id'`` argument, a reference to a unique identifier the client uses for the software object. If not provided, A UUID will be generated by SWH. You can do this with the following command: minimal deposit .. code:: shell $ swh deposit upload --username name --password secret \ --author "Jane Doe" \ --author "John Doe" \ --name 'je-suis-gpl' \ --archive je-suis-gpl.tgz with client's external identifier (``slug``) .. code:: shell $ swh deposit upload --username name --password secret \ --author "Jane Doe" \ --name 'je-suis-gpl' \ --archive je-suis-gpl.tgz \ --slug je-suis-gpl to a specific client's collection .. code:: shell $ swh deposit upload --username name --password secret \ --author "Jane Doe" \ --name 'je-suis-gpl' \ --archive je-suis-gpl.tgz \ --collection 'second-collection' You just posted a deposit to your collection on Software Heritage If everything went well, the successful response will contain the elements below: .. code:: shell { 'deposit_status': 'deposited', 'deposit_id': '7', 'deposit_date': 'Jan. 29, 2018, 12:29 p.m.' } Note: As the deposit is in ``deposited`` status, you can no longer update the deposit after this query. It will be answered with a 403 forbidden answer. If something went wrong, an equivalent response will be given with the `error` and `detail` keys explaining the issue, e.g.: .. code:: shell { 'error': 'Unknown collection name xyz', 'detail': None, 'deposit_status': None, 'deposit_status_detail': None, 'deposit_swh_id': None, 'status': 404 } multisteps deposit ^^^^^^^^^^^^^^^^^^^^^^^^^ The steps to create a multisteps deposit: 1. Create an incomplete deposit ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ First use the ``--partial`` argument to declare there is more to come .. code:: shell $ swh deposit upload --username name --password secret \ --archive foo.tar.gz \ --partial 2. Add content or metadata to the deposit ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Continue the deposit by using the ``--deposit-id`` argument given as a response for the first step. You can continue adding content or metadata while you use the ``--partial`` argument. To only add one new archive to the deposit: .. code:: shell $ swh deposit upload --username name --password secret \ --archive add-foo.tar.gz \ --deposit-id 42 \ --partial To only add metadata to the deposit: .. code:: shell $ swh deposit upload --username name --password secret \ --metadata add-foo.tar.gz.metadata.xml \ --deposit-id 42 \ --partial or: .. code:: shell $ swh deposit upload --username name --password secret \ --name 'add-foo' --author 'someone' \ --deposit-id 42 \ --partial 3. Finalize deposit ~~~~~~~~~~~~~~~~~~~ On your last addition (same command as before), by not declaring it ``--partial``, the deposit will be considered completed. Its status will be changed to ``deposited`` Update deposit ---------------- * replace deposit: - only possible if the deposit status is ``partial`` and ``--deposit-id `` is provided - by using the ``--replace`` flag - ``--metadata-deposit`` replaces associated existing metadata - ``--archive-deposit`` replaces associated archive(s) - by default, with no flag or both, you'll replace associated metadata and archive(s): .. code:: shell $ swh deposit upload --username name --password secret \ --deposit-id 11 \ --archive updated-je-suis-gpl.tgz \ --replace * update a loaded deposit with a new version: - by using the external-id with the ``--slug`` argument, you will link the new deposit with its parent deposit: .. code:: shell $ swh deposit upload --username name --password secret \ --archive je-suis-gpl-v2.tgz \ --slug 'je-suis-gpl' \ Check the deposit's status -------------------------- You can check the status of the deposit by using the ``--deposit-id`` argument: .. code:: shell $ swh deposit status --username name --password secret \ --deposit-id 11 .. code:: json { 'deposit_id': '11', 'deposit_status': 'deposited', 'deposit_swh_id': None, 'deposit_status_detail': 'Deposit is ready for additional checks \ (tarball ok, metadata, etc...)' } The different statuses: - **partial**: multipart deposit is still ongoing - **deposited**: deposit completed - **rejected**: deposit failed the checks - **verified**: content and metadata verified - **loading**: loading in-progress - **done**: loading completed successfully - **failed**: the deposit loading has failed When the deposit has been loaded into the archive, the status will be marked ``done``. In the response, will also be available the , , , . For example: .. code:: json { 'deposit_id': '11', 'deposit_status': 'done', 'deposit_swh_id': 'swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9', - 'deposit_swh_id_context': 'swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9;origin=https://forge.softwareheritage.org/source/jesuisgpl/', + 'deposit_swh_id_context': 'swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9;origin=https://forge.softwareheritage.org/source/jesuisgpl/;visit=swh:1:snp:68c0d26104d47e278dd6be07ed61fafb561d0d20;anchor=swh:1:rev:e76ea49c9ffbb7f73611087ba6e999b19e5d71eb;path=/', 'deposit_swh_anchor_id': 'swh:1:rev:e76ea49c9ffbb7f73611087ba6e999b19e5d71eb', 'deposit_swh_anchor_id_context': 'swh:1:rev:e76ea49c9ffbb7f73611087ba6e999b19e5d71eb;origin=https://forge.softwareheritage.org/source/jesuisgpl/', 'deposit_status_detail': 'The deposit has been successfully \ loaded into the Software Heritage archive' } diff --git a/requirements-swh-server.txt b/requirements-swh-server.txt index b12106e5..86a85993 100644 --- a/requirements-swh-server.txt +++ b/requirements-swh-server.txt @@ -1,4 +1,4 @@ swh.core[http] swh.loader.core >= 0.0.71 swh.scheduler >= 0.0.39 -swh.model >= 0.0.26 +swh.model >= 0.1.0 diff --git a/swh/deposit/api/private/deposit_update_status.py b/swh/deposit/api/private/deposit_update_status.py index 87d94f70..b4e2b898 100644 --- a/swh/deposit/api/private/deposit_update_status.py +++ b/swh/deposit/api/private/deposit_update_status.py @@ -1,82 +1,111 @@ -# Copyright (C) 2017-2019 The Software Heritage developers +# Copyright (C) 2017-2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU General Public License version 3, or any later version # See top-level LICENSE file for more information from rest_framework.parsers import JSONParser -from swh.model.identifiers import persistent_identifier, REVISION, DIRECTORY +from swh.model.identifiers import DIRECTORY, persistent_identifier, REVISION, SNAPSHOT from . import SWHPrivateAPIView from ..common import SWHPutDepositAPI from ...errors import make_error_dict, BAD_REQUEST from ...models import Deposit, DEPOSIT_STATUS_DETAIL from ...models import DEPOSIT_STATUS_LOAD_SUCCESS +MANDATORY_KEYS = ["origin_url", "revision_id", "directory_id", "snapshot_id"] + + class SWHUpdateStatusDeposit(SWHPrivateAPIView, SWHPutDepositAPI): """Deposit request class to update the deposit's status. HTTP verbs supported: PUT """ parser_classes = (JSONParser,) def additional_checks(self, request, headers, collection_name, deposit_id=None): """Enrich existing checks to the default ones. New checks: - Ensure the status is provided - Ensure it exists + - no missing information on load success update """ data = request.data status = data.get("status") if not status: msg = "The status key is mandatory with possible values %s" % list( DEPOSIT_STATUS_DETAIL.keys() ) return make_error_dict(BAD_REQUEST, msg) if status not in DEPOSIT_STATUS_DETAIL: msg = "Possible status in %s" % list(DEPOSIT_STATUS_DETAIL.keys()) return make_error_dict(BAD_REQUEST, msg) if status == DEPOSIT_STATUS_LOAD_SUCCESS: - swh_id = data.get("revision_id") - if not swh_id: - msg = "Updating status to %s requires a revision_id key" % (status,) + missing_keys = [] + for key in MANDATORY_KEYS: + value = data.get(key) + if value is None: + missing_keys.append(key) + + if missing_keys: + msg = ( + f"Updating deposit status to {status}" + f" requires information {','.join(missing_keys)}" + ) return make_error_dict(BAD_REQUEST, msg) return {} def process_put(self, request, headers, collection_name, deposit_id): - """Update the deposit's status + """Update the deposit with status and SWHIDs Returns: 204 No content + 400 Bad request if checks fail """ - deposit = Deposit.objects.get(pk=deposit_id) - deposit.status = request.data["status"] # checks already done before + data = request.data - origin_url = request.data.get("origin_url") + deposit = Deposit.objects.get(pk=deposit_id) - dir_id = request.data.get("directory_id") - if dir_id: - deposit.swh_id = persistent_identifier(DIRECTORY, dir_id) + status = data["status"] + deposit.status = status + if status == DEPOSIT_STATUS_LOAD_SUCCESS: + origin_url = data["origin_url"] + directory_id = data["directory_id"] + revision_id = data["revision_id"] + dir_id = persistent_identifier(DIRECTORY, directory_id) + snp_id = persistent_identifier(SNAPSHOT, data["snapshot_id"]) + rev_id = persistent_identifier(REVISION, revision_id) + + deposit.swh_id = dir_id + # new id with contextual information deposit.swh_id_context = persistent_identifier( - DIRECTORY, dir_id, metadata={"origin": origin_url} + DIRECTORY, + directory_id, + metadata={ + "origin": origin_url, + "visit": snp_id, + "anchor": rev_id, + "path": "/", + }, ) - rev_id = request.data.get("revision_id") - if rev_id: - deposit.swh_anchor_id = persistent_identifier(REVISION, rev_id) + # backward compatibility for now + deposit.swh_anchor_id = rev_id deposit.swh_anchor_id_context = persistent_identifier( - REVISION, rev_id, metadata={"origin": origin_url} + REVISION, revision_id, metadata={"origin": origin_url} ) + else: # rejected + deposit.status = status deposit.save() return {} diff --git a/swh/deposit/tests/api/test_deposit_private_update_status.py b/swh/deposit/tests/api/test_deposit_private_update_status.py index c9bc27a5..fa05bf48 100644 --- a/swh/deposit/tests/api/test_deposit_private_update_status.py +++ b/swh/deposit/tests/api/test_deposit_private_update_status.py @@ -1,154 +1,200 @@ -# Copyright (C) 2017-2019 The Software Heritage developers +# Copyright (C) 2017-2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU General Public License version 3, or any later version # See top-level LICENSE file for more information +import copy import json from django.urls import reverse from rest_framework import status -from swh.deposit.models import Deposit, DEPOSIT_STATUS_DETAIL +from swh.model.identifiers import DIRECTORY, persistent_identifier, REVISION, SNAPSHOT + +from swh.deposit.api.private.deposit_update_status import MANDATORY_KEYS + +from swh.deposit.models import Deposit from swh.deposit.config import ( PRIVATE_PUT_DEPOSIT, - DEPOSIT_STATUS_VERIFIED, DEPOSIT_STATUS_LOAD_SUCCESS, + DEPOSIT_STATUS_LOAD_FAILURE, ) PRIVATE_PUT_DEPOSIT_NC = PRIVATE_PUT_DEPOSIT + "-nc" def private_check_url_endpoints(collection, deposit): """There are 2 endpoints to check (one with collection, one without)""" return [ reverse(PRIVATE_PUT_DEPOSIT, args=[collection.name, deposit.id]), reverse(PRIVATE_PUT_DEPOSIT_NC, args=[deposit.id]), ] -def test_update_deposit_status( +def test_update_deposit_status_success_with_info( authenticated_client, deposit_collection, ready_deposit_verified ): - """Existing status for update should return a 204 response + """Update deposit with load success should require all information to succeed """ deposit = ready_deposit_verified + expected_status = DEPOSIT_STATUS_LOAD_SUCCESS + origin_url = "something" + directory_id = "42a13fc721c8716ff695d0d62fc851d641f3a12b" + revision_id = "47dc6b4636c7f6cba0df83e3d5490bf4334d987e" + snapshot_id = "68c0d26104d47e278dd6be07ed61fafb561d0d20" + + full_body_info = { + "status": DEPOSIT_STATUS_LOAD_SUCCESS, + "revision_id": revision_id, + "directory_id": directory_id, + "snapshot_id": snapshot_id, + "origin_url": origin_url, + } for url in private_check_url_endpoints(deposit_collection, deposit): - possible_status = set(DEPOSIT_STATUS_DETAIL.keys()) - set( - [DEPOSIT_STATUS_LOAD_SUCCESS] + dir_id = persistent_identifier(DIRECTORY, directory_id) + rev_id = persistent_identifier(REVISION, revision_id) + snp_id = persistent_identifier(SNAPSHOT, snapshot_id) + + expected_swh_id = "swh:1:dir:%s" % directory_id + expected_swh_id_context = ( + f"{dir_id};origin={origin_url};" + f"visit={snp_id};anchor={rev_id};path=/" ) + expected_swh_anchor_id = rev_id + expected_swh_anchor_id_context = f"{rev_id};origin={origin_url}" - for _status in possible_status: - response = authenticated_client.put( - url, - content_type="application/json", - data=json.dumps({"status": _status}), - ) + response = authenticated_client.put( + url, content_type="application/json", data=json.dumps(full_body_info), + ) - assert response.status_code == status.HTTP_204_NO_CONTENT + assert response.status_code == status.HTTP_204_NO_CONTENT - deposit = Deposit.objects.get(pk=deposit.id) - assert deposit.status == _status + deposit = Deposit.objects.get(pk=deposit.id) + assert deposit.status == expected_status + assert deposit.swh_id == expected_swh_id + assert deposit.swh_id_context == expected_swh_id_context + assert deposit.swh_anchor_id == expected_swh_anchor_id + assert deposit.swh_anchor_id_context == expected_swh_anchor_id_context - deposit.status = DEPOSIT_STATUS_VERIFIED - deposit.save() # hack the same deposit + # Reset deposit + deposit = ready_deposit_verified + deposit.save() -def test_update_deposit_status_with_info( +def test_update_deposit_status_rejected_with_info( authenticated_client, deposit_collection, ready_deposit_verified ): - """Existing status for update with info should return a 204 response + """Update deposit with rejected status needs few information to succeed """ deposit = ready_deposit_verified - for url in private_check_url_endpoints(deposit_collection, deposit): - expected_status = DEPOSIT_STATUS_LOAD_SUCCESS - origin_url = "something" - directory_id = "42a13fc721c8716ff695d0d62fc851d641f3a12b" - revision_id = "47dc6b4636c7f6cba0df83e3d5490bf4334d987e" - expected_swh_id = "swh:1:dir:%s" % directory_id - expected_swh_id_context = "swh:1:dir:%s;origin=%s" % (directory_id, origin_url) - expected_swh_anchor_id = "swh:1:rev:%s" % revision_id - expected_swh_anchor_id_context = "swh:1:rev:%s;origin=%s" % ( - revision_id, - origin_url, - ) + for url in private_check_url_endpoints(deposit_collection, deposit): response = authenticated_client.put( url, content_type="application/json", - data=json.dumps( - { - "status": expected_status, - "revision_id": revision_id, - "directory_id": directory_id, - "origin_url": origin_url, - } - ), + data=json.dumps({"status": DEPOSIT_STATUS_LOAD_FAILURE}), ) assert response.status_code == status.HTTP_204_NO_CONTENT deposit = Deposit.objects.get(pk=deposit.id) - assert deposit.status == expected_status - assert deposit.swh_id == expected_swh_id - assert deposit.swh_id_context == expected_swh_id_context - assert deposit.swh_anchor_id == expected_swh_anchor_id - assert deposit.swh_anchor_id_context == expected_swh_anchor_id_context + assert deposit.status == DEPOSIT_STATUS_LOAD_FAILURE + + assert deposit.swh_id is None + assert deposit.swh_id_context is None + assert deposit.swh_anchor_id is None + assert deposit.swh_anchor_id_context is None - deposit.swh_id = None - deposit.swh_id_context = None - deposit.swh_anchor_id = None - deposit.swh_anchor_id_context = None - deposit.status = DEPOSIT_STATUS_VERIFIED + # Reset status + deposit = ready_deposit_verified deposit.save() +def test_update_deposit_status_success_with_incomplete_data( + authenticated_client, deposit_collection, ready_deposit_verified +): + """Update deposit status with status success and incomplete information should fail + + """ + deposit = ready_deposit_verified + + origin_url = "something" + directory_id = "42a13fc721c8716ff695d0d62fc851d641f3a12b" + revision_id = "47dc6b4636c7f6cba0df83e3d5490bf4334d987e" + snapshot_id = "68c0d26104d47e278dd6be07ed61fafb561d0d20" + + new_status = DEPOSIT_STATUS_LOAD_SUCCESS + full_body_info = { + "status": new_status, + "revision_id": revision_id, + "directory_id": directory_id, + "snapshot_id": snapshot_id, + "origin_url": origin_url, + } + + for url in private_check_url_endpoints(deposit_collection, deposit): + for key in MANDATORY_KEYS: + # Crafting body with missing information so that it raises + body = copy.deepcopy(full_body_info) + body.pop(key) # make the body incomplete + + response = authenticated_client.put( + url, content_type="application/json", data=json.dumps(body), + ) + + assert response.status_code == status.HTTP_400_BAD_REQUEST + assert ( + f"deposit status to {new_status} requires information {key}" + in response.content.decode("utf-8") + ) + + def test_update_deposit_status_will_fail_with_unknown_status( authenticated_client, deposit_collection, ready_deposit_verified ): """Unknown status for update should return a 400 response """ deposit = ready_deposit_verified for url in private_check_url_endpoints(deposit_collection, deposit): response = authenticated_client.put( url, content_type="application/json", data=json.dumps({"status": "unknown"}) ) assert response.status_code == status.HTTP_400_BAD_REQUEST def test_update_deposit_status_will_fail_with_no_status_key( authenticated_client, deposit_collection, ready_deposit_verified ): """No status provided for update should return a 400 response """ deposit = ready_deposit_verified for url in private_check_url_endpoints(deposit_collection, deposit): response = authenticated_client.put( url, content_type="application/json", data=json.dumps({"something": "something"}), ) assert response.status_code == status.HTTP_400_BAD_REQUEST def test_update_deposit_status_success_without_swh_id_fail( authenticated_client, deposit_collection, ready_deposit_verified ): """Providing successful status without swh_id should return a 400 """ deposit = ready_deposit_verified for url in private_check_url_endpoints(deposit_collection, deposit): response = authenticated_client.put( url, content_type="application/json", data=json.dumps({"status": DEPOSIT_STATUS_LOAD_SUCCESS}), ) assert response.status_code == status.HTTP_400_BAD_REQUEST