Changeset View
Changeset View
Standalone View
Standalone View
docs/architecture.rst
.. _architecture: | .. _architecture: | ||||
Software Architecture | Software Architecture | ||||
===================== | ===================== | ||||
From an end-user point of view, the |swh| platform consists in the | From an end-user point of view, the |swh| platform consists in the | ||||
:term:`archive`, which can be accessed using the web interface or its REST API. | :term:`archive`, which can be accessed using the web interface or its REST API. | ||||
Behind the scene (and the web app) are several components that expose | Behind the scene (and the web app) are several components that expose | ||||
different aspects of the |swh| :term:`archive` as internal REST APIs. | different aspects of the |swh| :term:`archive` as internal REST APIs. | ||||
Each of these internal APIs have a dedicated (Postgresql) database. | Each of these internal APIs have a dedicated (Postgresql) database. | ||||
A global view of this architecture looks like: | A global (and incomplete) view of this architecture looks like: | ||||
.. thumbnail:: images/general-architecture.svg | .. thumbnail:: images/general-architecture.svg | ||||
General view of the |swh| architecture. | General view of the |swh| architecture. | ||||
The front API components are: | The front API components are: | ||||
- :ref:`Storage API <swh-storage>` | - :ref:`Storage API <swh-storage>` (including the Metadata Storage) | ||||
- :ref:`Deposit API <swh-deposit>` | - :ref:`Deposit API <swh-deposit>` | ||||
- :ref:`Vault API <swh-vault>` | - :ref:`Vault API <swh-vault>` | ||||
- :ref:`Indexer API <swh-indexer>` | - :ref:`Indexer API <swh-indexer>` | ||||
- :ref:`Scheduler API <swh-scheduler>` | - :ref:`Scheduler API <swh-scheduler>` | ||||
On the back stage of this show, a celery_ based game of tasks and workers | On the back stage of this show, a celery_ based game of tasks and workers | ||||
occurs to perform all the required work to fill, maintain and update the |swh| | occurs to perform all the required work to fill, maintain and update the |swh| | ||||
:term:`archive`. | :term:`archive`. | ||||
Show All 9 Lines | - :term:`Loaders <loader>`: a loader is a type of task aiming at importing or | ||||
updating a source code repository. It is the one that inserts :term:`blob` | updating a source code repository. It is the one that inserts :term:`blob` | ||||
objects in the :term:`object storage`, and inserts nodes and edges in the | objects in the :term:`object storage`, and inserts nodes and edges in the | ||||
:ref:`graph <swh-merkle-dag>`. | :ref:`graph <swh-merkle-dag>`. | ||||
- :term:`Indexers <indexer>`: an indexer is a type of task aiming at crawling | - :term:`Indexers <indexer>`: an indexer is a type of task aiming at crawling | ||||
the content of the :term:`archive` to extract derived information (mimetype, | the content of the :term:`archive` to extract derived information (mimetype, | ||||
etc.) | etc.) | ||||
- :term:`Vault <vault>`: this type of celery task is responsible for cooking a | |||||
compressed archive (zip or tgz) of an archived object (typically a directory | |||||
or a repository). Since this can be a rather long process, it is delegated to | |||||
an asynchronous (celery) task. | |||||
Tasks | Tasks | ||||
----- | ----- | ||||
Listers | |||||
+++++++ | |||||
The following sequence diagram shows the interactions between these components | The following sequence diagram shows the interactions between these components | ||||
when a new forge needs to be archived. This example depicts the case of a | when a new forge needs to be archived. This example depicts the case of a | ||||
gitlab_ forge, but any other supported source type would be very similar. | gitlab_ forge, but any other supported source type would be very similar. | ||||
.. thumbnail:: images/tasks-lister.svg | .. thumbnail:: images/tasks-lister.svg | ||||
As one might observe in this diagram, it does create two things: | As one might observe in this diagram, it does two things: | ||||
- it adds one :term:`origin` objects in the :term:`storage` database for each | - it ask the forge (a gitlab_ instance in this case) the list of known repositories, and | ||||
ardumont: asks | |||||
source code repository, and | |||||
- it insert one :term:`loader` task for each source code repository that will | - it insert one :term:`loader` task for each source code repository that will | ||||
be in charge of importing the content of that repository. | be in charge of importing the content of that repository. | ||||
Note that most listers usually work in incremental mode, meaning they store in a | |||||
dedicated database the current state of the listing of the forge. Then, on a subsequent | |||||
executino of the lister, it will ask only for new repositories. | |||||
ardumontUnsubmitted Not Done Inline Actionsexecution ardumont: execution | |||||
Also note that if the lister inserts a new loading task for a repository for which a | |||||
loading task already exists, the existing task will be updated (if needed) instead of | |||||
creating a new task. | |||||
Loaders | |||||
+++++++ | |||||
The sequence diagram below describe this second step of importing the content | The sequence diagram below describe this second step of importing the content | ||||
of a repository. Once again, we take the example of a git repository, but any | of a repository. Once again, we take the example of a git repository, but any | ||||
other type of repository would be very similar. | other type of repository would be very similar. | ||||
.. thumbnail:: images/tasks-git-loader.svg | .. thumbnail:: images/tasks-git-loader.svg | ||||
.. _celery: https://www.celeryproject.org | .. _celery: https://www.celeryproject.org | ||||
.. _gitlab: https://gitlab.com | .. _gitlab: https://gitlab.com |
asks