diff --git a/AUTHORS b/AUTHORS --- a/AUTHORS +++ b/AUTHORS @@ -1,3 +1,3 @@ -Copyright (C) 2015-2016 The Software Heritage developers +Copyright (C) 2017-2018 The Software Heritage developers See http://www.softwareheritage.org/ for more information. diff --git a/docs/getting-started.rst b/docs/getting-started.rst --- a/docs/getting-started.rst +++ b/docs/getting-started.rst @@ -175,7 +175,7 @@ The steps to create a multisteps deposit: 1. Create an incomplete deposit -~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ First use the ``--partial`` argument to declare there is more to come .. code:: shell @@ -186,7 +186,7 @@ 2. Add content or metadata to the deposit -~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Continue the deposit by using the ``--deposit-id`` argument given as a response for the first step. You can continue adding content or metadata while you use the ``--partial`` argument. @@ -235,7 +235,7 @@ ``--deposit-id `` is provided - by using the ``--replace`` flag - + - ``--metadata-deposit`` replaces associated existing metadata - ``--archive-deposit`` replaces associated archive(s) - by default, with no flag or both, you'll replace associated @@ -268,7 +268,7 @@ .. code:: shell -$ swh-deposit --username name --password secret --deposit-id '11' --status + $ swh-deposit --username name --password secret --deposit-id '11' --status .. code:: json diff --git a/docs/index.rst b/docs/index.rst --- a/docs/index.rst +++ b/docs/index.rst @@ -12,6 +12,7 @@ metadata.rst dev-info.rst sys-info.rst + specs/specs.rst Indices and tables ================== diff --git a/docs/blueprint.rst b/docs/specs/blueprint.rst rename from docs/blueprint.rst rename to docs/specs/blueprint.rst --- a/docs/blueprint.rst +++ b/docs/specs/blueprint.rst @@ -8,13 +8,13 @@ From client's deposit repository server to SWH's repository server: 1. The client requests for the server's abilities and its associated collection - (GET query to the *SD/service document uri*) + (GET query to the *SD/service document uri*) 2. The server answers the client with the service document which gives the - *collection uri* (also known as *COL/collection IRI*). + *collection uri* (also known as *COL/collection IRI*). 3. The client sends a deposit (optionally a zip archive, some metadata or both) - through the *collection uri*. + through the *collection uri*. This can be done in: @@ -22,16 +22,16 @@ * one POST request (metadata or archive) + other PUT or POST request to the *update uris* (*edit-media iri* or *edit iri*) - 1. Server validates the client's input or returns detailed error if any + a. Server validates the client's input or returns detailed error if any - 2. Server stores information received (metadata or software archive source + b. Server stores information received (metadata or software archive source code or both) 4. The server notifies the client it acknowledged the client's request. An - ``http 201 Created`` response with a deposit receipt in the body response is - sent back. That deposit receipt will hold the necessary information to - eventually complete the deposit later on if it was incomplete (also known as - status ``partial``). + ``http 201 Created`` response with a deposit receipt in the body response is + sent back. That deposit receipt will hold the necessary information to + eventually complete the deposit later on if it was incomplete (also known as + status ``partial``). Schema representation ^^^^^^^^^^^^^^^^^^^^^ diff --git a/docs/specs/metadata_example.xml b/docs/specs/metadata_example.xml new file mode 100644 --- /dev/null +++ b/docs/specs/metadata_example.xml @@ -0,0 +1,38 @@ + + + "{http://www.w3.org/2005/Atom}author": { + "{http://www.w3.org/2005/Atom}email": "hal@ccsd.cnrs.fr", + "{http://www.w3.org/2005/Atom}name": "HAL" + }, + + HAL + hal@ccsd.cnrs.fr + + hal + hal-01243573 + The assignment problem + https://hal.archives-ouvertes.fr/hal-01243573 + other identifier, DOI, ARK + Domain + description + + author1 + Inria + UPMC + + + author2 + Inria + UPMC + + + + + ./path/to/file.txt + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + + + + diff --git a/docs/spec-loading.rst b/docs/specs/spec-loading.rst rename from docs/spec-loading.rst rename to docs/specs/spec-loading.rst diff --git a/docs/specs/spec-meta-deposit.rst b/docs/specs/spec-meta-deposit.rst new file mode 100644 --- /dev/null +++ b/docs/specs/spec-meta-deposit.rst @@ -0,0 +1,31 @@ +The meta-deposit +================ + +Goal +---- +A client wishes to deposit only metadata about an object in the Software +Heritage archive. + +The meta-deposit is a special deposit where no content is +deposited and the data transfered to Software Heritage is only +the metadata about an object or several objects in the archive. + +The scope of the meta-deposit is larger than the sparse-deposit, because +with a meta-deposit all types of objects in the archive can be described +with the deposited metadata: + +- origin +- snapshot +- revision +- release +- directory +- content + + +Loading procedure +------------------ + +In this case, the meta-deposit will be injected as a metadata entry at the +appropriate level (origin_metadata, revision_metadata, etc.) and won't result +in the creation of a new object like with the complete deposit and the +sparse-deposit. diff --git a/docs/specs/spec-sparse-deposit.rst b/docs/specs/spec-sparse-deposit.rst new file mode 100644 --- /dev/null +++ b/docs/specs/spec-sparse-deposit.rst @@ -0,0 +1,109 @@ +The sparse-deposit +================== + +Goal +---- +A client wishes to transfer a tarball for which part of the content is +already in the SWH archive. + +Requirements +------------ +To do so, the paths to the missing directories/content must be provided as +empty paths in the tarball and the list linking each path to the object in the +archive will be provided as part of the metadata. The list will be refered to +as the manifest list. + ++----------------------+-------------------------------------+ +| path | swh-id | ++======================+=====================================+ +| ./path/to/file.txt | swh:1:cnt:aaaaaaaaaaaaaaaaaaaaa... | ++----------------------+-------------------------------------+ +| ./path/to/dir/ | swh:1:dir:aaaaaaaaaaaaaaaaaaaaa... | ++----------------------+-------------------------------------+ + +Note: the *name* of the file or the directory is given by the path and is not +part of the identified object. + +A concrete example +------------------ +The manifest list is included in the metadata xml atomEntry under the +swh namespace: + +.. code:: xml + + + + + HAL + hal@ccsd.cnrs.fr + + hal + hal-01243573 + The assignment problem + https://hal.archives-ouvertes.fr/hal-01243573 + other identifier, DOI, ARK + Domain + description + + author1 + Inria + UPMC + + + author2 + Inria + UPMC + + + + + ./path/to/file.txt + swh:1:cnt:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + + + ./path/to/second_file.txt + swh:1:cnt:bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb + + + ./path/to/dir/ + swh:1:dir:ddddddddddddddddddddddddddddddddd + + + + + +The tarball sent with the deposit will contain the following empty paths: +- path/to/file.txt +- path/to/second_file.txt +- path/to/dir/ + +Deposit verification +-------------------- + +After checking the integrity of the deposit content and +metadata, the following checks should be added: + +1. validate the manifest list structure with a swh-id for each path +2. verify that the paths in the manifest list are explicit and empty in the tarball +3. verify that the path name corresponds to the object type +4. locate the identifiers in the SWH archive + +Each one of the verifications should return a different error with the deposit +and result in a 'rejected' deposit. + +Loading procedure +------------------ +The injection procedure should include: + +- load the tarball data +- create new objects using the path name and create links from the path to the + SWH object using the identifier +- calculate identifier of the new objects at each level +- return final swh-id of the new revision + +Invariant: the same content should yield the same swhid, that's why a complete +deposit with all the content and a sparse-deposit with the correct links will +result with the same root directory swh-id and if the metadata are identical +also with the same revision swh-id. diff --git a/docs/specs/specs.rst b/docs/specs/specs.rst new file mode 100644 --- /dev/null +++ b/docs/specs/specs.rst @@ -0,0 +1,13 @@ +.. _swh-deposit-specs: + +Software Heritage Deposit Specifications +======================================== + +.. toctree:: + :maxdepth: 1 + :caption: Contents: + + blueprint.rst + spec-loading.rst + spec-sparse-deposit.rst + spec-meta-deposit.rst diff --git a/swh/deposit/api/all_deposit_history.py b/swh/deposit/api/all_deposit_history.py new file mode 100644 --- /dev/null +++ b/swh/deposit/api/all_deposit_history.py @@ -0,0 +1,27 @@ +# Copyright (C) 2017 The Software Heritage developers +# See the AUTHORS file at the top-level directory of this distribution +# License: GNU General Public License version 3, or any later version +# See top-level LICENSE file for more information + +from django.shortcuts import render +from rest_framework import status +from rest_framework.views import APIView +from ..models import Deposit + + +class SWHAllDepositHistory(APIView): + """List all deposit history. + + HTTP verbs supported: GET + + """ + def get(self, req): + # limiting to the 100 last deposits, order by last recieved + results = Deposit.objects.all().order_by('-reception_date')[:100] + context = { + 'deposits': results + } + return render(req, 'history.html', + context=context, + content_type='html', + status=status.HTTP_200_OK) diff --git a/swh/deposit/api/urls.py b/swh/deposit/api/urls.py --- a/swh/deposit/api/urls.py +++ b/swh/deposit/api/urls.py @@ -29,9 +29,9 @@ from .deposit_content import SWHDepositContent from .service_document import SWHServiceDocument + urlpatterns = [ # PUBLIC API - # SD IRI - Service Document IRI # -> GET url(r'^servicedocument/', SWHServiceDocument.as_view(), diff --git a/swh/deposit/settings/development.py b/swh/deposit/settings/development.py --- a/swh/deposit/settings/development.py +++ b/swh/deposit/settings/development.py @@ -51,6 +51,10 @@ 'default': { 'ENGINE': 'django.db.backends.postgresql', 'NAME': 'swh-deposit-dev', + # 'NAME': 'softwareheritage-deposit', + # 'USER': 'guest', + # 'PASSWORD': 'guest', + # 'HOST': 'db.internal.softwareheritage.org', } } diff --git a/swh/deposit/templates/history.html b/swh/deposit/templates/history.html new file mode 100644 --- /dev/null +++ b/swh/deposit/templates/history.html @@ -0,0 +1,63 @@ +{% extends "layout.html" %} +{% load static %} +{% block title %}The Software Heritage Deposit{% endblock %} + +{% block content %} +
+
+

Monitor deposits made to the Software Heritage archive

+
+
+

+ This interface enables to track the status of the different deposits + made to the Software Heritage archive via SWORD protocol. +

+ + +
+
+ + + + + + + + + + + + + {% for deposit in deposits %} + + + + + + + {% if deposit.swh_id %} + + {% else %} + + {% endif %} + + {% empty %} + + {% endfor %} + +
deposit-IDexternal-IDdatestatuserror messageSWH-ID
{{ deposit.id }}{{ deposit.external_id }} + {{ deposit.reception_date | date:"Y-m-d" }} + {{ deposit.status }}{{ deposit.error_message}} + + {{deposit.swh_id}} +
Sorry no deposits were found
+
+

*The deposit history is limited for public view to the last 100 + deposits made to the Software Heritage archive. For more information + contact us at the following email address + deposit@softwareheritage.org +

+
+
+
+{% endblock %} diff --git a/swh/deposit/urls.py b/swh/deposit/urls.py --- a/swh/deposit/urls.py +++ b/swh/deposit/urls.py @@ -31,7 +31,6 @@ from django.shortcuts import render from django.views.generic.base import RedirectView from rest_framework.urlpatterns import format_suffix_patterns - favicon_view = RedirectView.as_view(url='/static/img/icons/swh-logo-32x32.png', permanent=True)