Changeset View
Standalone View
docs/specs/spec-technical.rst
- This file was added.
Technical specifications | |||||
======================== | |||||
Requirements | |||||
~~~~~~~~~~~~ | |||||
* one dedicated database to store the deposit's state - swh-deposit | |||||
* one dedicated temporary objstorage to store archives before loading | |||||
ardumont: In the end, we are not using an objstorage. | |||||
Done Inline Actionsack. moranegg: ack.
Where are the archives stored? | |||||
Done Inline Actionson disk (but not an objstorage) with references in the db. ardumont: on disk (but not an objstorage) with references in the db. | |||||
* one client to test the communication with SWORD protocol | |||||
Done Inline ActionsIt's no longer a test. Real customers are using it now. ardumont: It's no longer a test. Real customers are using it now. | |||||
Done Inline ActionsYes sure, it is still specs :-D
because we still need and have a client (swh client) to test the communication. moranegg: Yes sure, it is still specs :-D
I will change that to
> A client to test the communication… | |||||
Deposit reception schema | |||||
~~~~~~~~~~~~~~~~~~~~~~~~ | |||||
* SWORD imposes the use of basic authentication, so we need a way to | |||||
authenticate client. Also, a client can access collections: | |||||
**deposit\_client** table: | |||||
- id (bigint): Client's identifier | |||||
Done Inline ActionsThe format is badly formatted (as initially). also s/crypted/encrypted. ardumont: The format is badly formatted (as initially).
Still, can you please add missing end of line in… | |||||
Done Inline Actionssure. moranegg: sure.
Can you verify that the content is still accurate? | |||||
Done Inline Actions
I don't understand. ardumont: > Can you verify that the content is still accurate?
I don't understand.
What must i check? | |||||
- username (str): Client's username | |||||
- password (pass): Client's encrypted password | |||||
- collections ([id]): List of collections the client can access | |||||
* Collections group deposits together: | |||||
Done Inline ActionsSame regarding format. ardumont: Same regarding format. | |||||
Done Inline Actions:-) moranegg: :-) | |||||
**deposit\_collection** table: | |||||
- id (bigint): Collection's identifier | |||||
- name (str): Collection's human readable name | |||||
* A deposit is the main object the repository is all about: | |||||
**deposit** table: | |||||
* id (bigint): deposit's identifier | |||||
* reception\_date (date): First deposit's reception date | |||||
* complete\_data (date): Date when the deposit is deemed complete and ready | |||||
for loading | |||||
* collection (id): The collection the deposit belongs to | |||||
* external id (text): client's internal identifier (e.g hal's id, etc...). | |||||
* client\_id (id) : Client which did the deposit | |||||
* swh\_id (str) : swh identifier result once the loading is complete | |||||
* status (enum): The deposit's current status | |||||
ardumontUnsubmitted Done Inline ActionsUse - as the previous chapter. ardumont: Use `-` as the previous chapter. | |||||
moraneggAuthorUnsubmitted Done Inline Actionsack moranegg: ack | |||||
- As mentioned, a deposit can have a status, whose possible values are: | |||||
.. code:: text | |||||
'partial', -- the deposit is new or partially received since it | |||||
-- can be done in multiple requests | |||||
'expired', -- deposit has been there too long and is now deemed | |||||
-- ready to be garbage collected | |||||
'deposited' -- deposit complete, it is ready to be checked to ensure data consistency | |||||
'verified', -- deposit is fully received, checked, and ready for loading | |||||
'loading', -- loading is ongoing on swh's side | |||||
'done', -- loading is successful | |||||
'failed' -- loading is a failure | |||||
* A deposit is stateful and can be made in multiple requests: | |||||
**deposit\_request** table: | |||||
* id (bigint): identifier | |||||
* type (id): deposit request's type (possible values: 'archive', 'metadata') | |||||
* deposit\_id (id): deposit whose request belongs to | |||||
* metadata: metadata associated to the request | |||||
* date (date): date of the requests | |||||
Information sent along a request are stored in a ``deposit_request`` row. | |||||
They can be either of type ``metadata`` (atom entry, multipart's atom entry | |||||
part) or of type ``archive`` (binary upload, multipart's binary upload part). | |||||
When the deposit is complete (status ``deposited``), those ``metadata`` and | |||||
``archive`` deposit requests will be read and aggregated. They will then be | |||||
sent as parameters to the loading routine. | |||||
During loading, some of those metadata are kept in the ``origin_metadata`` | |||||
table and some other are stored in the ``revision`` table (see `metadata | |||||
loading <#metadata-loading>`__). | |||||
The only update actions occurring on the deposit table are in regards of: | |||||
Done Inline ActionsSame regarding format. ardumont: Same regarding format. | |||||
- status changing: | |||||
zackUnsubmitted Done Inline Actionsbetter "status changes" More generally, this should really be a state diagram, rather than a state transition function serialized as a set of lines. No need to add a separate figure for it, but at least a ASCII art state diagram would be nice here. zack: better "status changes"
More generally, this should really be a state diagram, rather than a… | |||||
moraneggAuthorUnsubmitted Done Inline Actionswe already have one but in latex :-/ moranegg: we already have one but in latex :-/
I'll see what I can do. | |||||
- ``partial`` -> {``expired``/``deposited``}, | |||||
- ``deposited`` -> {``rejected``/``verified``}, | |||||
- ``verified`` -> ``loading`` | |||||
- ``loading`` -> {``done``/``failed``} | |||||
- ``complete_date`` when the deposit is | |||||
finalized (when the status is changed to ``deposited``) | |||||
- ``swh-id`` is populated once we have the loading result |
In the end, we are not using an objstorage.