diff --git a/docs/architecture.rst b/docs/architecture.rst --- a/docs/architecture.rst +++ b/docs/architecture.rst @@ -10,7 +10,7 @@ Each of these internal APIs have a dedicated (Postgresql) database. -A global view of this architecture looks like: +A global (and incomplete) view of this architecture looks like: .. thumbnail:: images/general-architecture.svg @@ -18,7 +18,7 @@ The front API components are: -- :ref:`Storage API ` +- :ref:`Storage API ` (including the Metadata Storage) - :ref:`Deposit API ` - :ref:`Vault API ` - :ref:`Indexer API ` @@ -44,24 +44,41 @@ the content of the :term:`archive` to extract derived information (mimetype, etc.) +- :term:`Vault `: this type of celery task is responsible for cooking a + compressed archive (zip or tgz) of an archived object (typically a directory + or a repository). Since this can be a rather long process, it is delegated to + an asynchronous (celery) task. + Tasks ----- +Listers ++++++++ + The following sequence diagram shows the interactions between these components when a new forge needs to be archived. This example depicts the case of a gitlab_ forge, but any other supported source type would be very similar. .. thumbnail:: images/tasks-lister.svg -As one might observe in this diagram, it does create two things: +As one might observe in this diagram, it does two things: -- it adds one :term:`origin` objects in the :term:`storage` database for each - source code repository, and +- it ask the forge (a gitlab_ instance in this case) the list of known repositories, and - it insert one :term:`loader` task for each source code repository that will be in charge of importing the content of that repository. +Note that most listers usually work in incremental mode, meaning they store in a +dedicated database the current state of the listing of the forge. Then, on a subsequent +executino of the lister, it will ask only for new repositories. + +Also note that if the lister inserts a new loading task for a repository for which a +loading task already exists, the existing task will be updated (if needed) instead of +creating a new task. + +Loaders ++++++++ The sequence diagram below describe this second step of importing the content of a repository. Once again, we take the example of a git repository, but any