Changeset View
Standalone View
docs/specs/spec-technical.rst
- This file was added.
Technical specifications | |||||
======================== | |||||
Requirements | |||||
~~~~~~~~~~~~ | |||||
* one dedicated database to store the deposit's state - swh-deposit | |||||
* one dedicated temporary objstorage to store archives before loading | |||||
ardumont: In the end, we are not using an objstorage. | |||||
moraneggAuthorUnsubmitted Done Inline Actionsack. moranegg: ack.
Where are the archives stored? | |||||
ardumontUnsubmitted Done Inline Actionson disk (but not an objstorage) with references in the db. ardumont: on disk (but not an objstorage) with references in the db. | |||||
* one client to test the communication with SWORD protocol | |||||
ardumontUnsubmitted Done Inline ActionsIt's no longer a test. Real customers are using it now. ardumont: It's no longer a test. Real customers are using it now. | |||||
moraneggAuthorUnsubmitted Done Inline ActionsYes sure, it is still specs :-D
because we still need and have a client (swh client) to test the communication. moranegg: Yes sure, it is still specs :-D
I will change that to
> A client to test the communication… | |||||
Deposit reception schema | |||||
~~~~~~~~~~~~~~~~~~~~~~~~ | |||||
* SWORD imposes the use of basic authentication, so we need a way to | |||||
authenticate client. Also, a client can access collections: | |||||
**deposit\_client** table: - id (bigint): Client's identifier - username | |||||
(str): Client's username - password (pass): Client's crypted password - | |||||
collections ([id]): List of collections the client can access | |||||
ardumontUnsubmitted Done Inline ActionsThe format is badly formatted (as initially). also s/crypted/encrypted. ardumont: The format is badly formatted (as initially).
Still, can you please add missing end of line in… | |||||
moraneggAuthorUnsubmitted Done Inline Actionssure. moranegg: sure.
Can you verify that the content is still accurate? | |||||
ardumontUnsubmitted Done Inline Actions
I don't understand. ardumont: > Can you verify that the content is still accurate?
I don't understand.
What must i check? | |||||
* Collections group deposits together: | |||||
**deposit\_collection** table: - id (bigint): Collection's identifier - name | |||||
(str): Collection's human readable name | |||||
ardumontUnsubmitted Done Inline ActionsSame regarding format. ardumont: Same regarding format. | |||||
moraneggAuthorUnsubmitted Done Inline Actions:-) moranegg: :-) | |||||
* A deposit is the main object the repository is all about: | |||||
**deposit** table: | |||||
* id (bigint): deposit's identifier | |||||
* reception\_date (date): First deposit's reception date | |||||
* complete\_data (date): Date when the deposit is deemed complete and ready | |||||
for loading | |||||
* collection (id): The collection the deposit belongs to | |||||
* external id (text): client's internal identifier (e.g hal's id, etc...). | |||||
* client\_id (id) : Client which did the deposit | |||||
* swh\_id (str) : swh identifier result once the loading is complete | |||||
* status (enum): The deposit's current status | |||||
- As mentioned, a deposit can have a status, whose possible values are: | |||||
.. code:: text | |||||
Done Inline ActionsUse - as the previous chapter. ardumont: Use `-` as the previous chapter. | |||||
Done Inline Actionsack moranegg: ack | |||||
'partial', -- the deposit is new or partially received since it | |||||
-- can be done in multiple requests | |||||
'expired', -- deposit has been there too long and is now deemed | |||||
-- ready to be garbage collected | |||||
'deposited' -- deposit complete, it is ready to be checked to ensure data consistency | |||||
'verified', -- deposit is fully received, checked, and ready for loading | |||||
'loading', -- loading is ongoing on swh's side | |||||
'done', -- loading is successful | |||||
'failed' -- loading is a failure | |||||
* A deposit is stateful and can be made in multiple requests: | |||||
**deposit\_request** table: | |||||
* id (bigint): identifier | |||||
* type (id): deposit request's type (possible values: 'archive', 'metadata') | |||||
* deposit\_id (id): deposit whose request belongs to | |||||
* metadata: metadata associated to the request | |||||
* date (date): date of the requests | |||||
Information sent along a request are stored in a ``deposit_request`` row. | |||||
They can be either of type ``metadata`` (atom entry, multipart's atom entry | |||||
part) or of type ``archive`` (binary upload, multipart's binary upload part). | |||||
When the deposit is complete (status ``deposited``), those ``metadata`` and | |||||
``archive`` deposit requests will be read and aggregated. They will then be | |||||
sent as parameters to the loading routine. | |||||
During loading, some of those metadata are kept in the ``origin_metadata`` | |||||
table and some other are stored in the ``revision`` table (see `metadata | |||||
loading <#metadata-loading>`__). | |||||
The only update actions occurring on the deposit table are in regards of: - | |||||
status changing: - ``partial`` -> {``expired``/``deposited``}, - | |||||
``deposited`` -> {``rejected``/``verified``}, - ``verified`` -> ``loading`` - | |||||
``loading`` -> {``done``/``failed``} - ``complete_date`` when the deposit is | |||||
finalized (when the status is changed to ``deposited``) - ``swh-id`` is | |||||
populated once we have the loading result | |||||
ardumontUnsubmitted Done Inline ActionsSame regarding format. ardumont: Same regarding format. | |||||
Done Inline Actionsbetter "status changes" More generally, this should really be a state diagram, rather than a state transition function serialized as a set of lines. No need to add a separate figure for it, but at least a ASCII art state diagram would be nice here. zack: better "status changes"
More generally, this should really be a state diagram, rather than a… | |||||
Done Inline Actionswe already have one but in latex :-/ moranegg: we already have one but in latex :-/
I'll see what I can do. |
In the end, we are not using an objstorage.