Changeset View
Changeset View
Standalone View
Standalone View
docs/architecture.rst
- This file was added.
.. _architecture: | |||||
Software Architecture | |||||
===================== | |||||
From an end-user point of view, the |swh| platform consists in the | |||||
:term:`archive`, which can be accessed using the web interface or its REST API. | |||||
Behind the scene (and the web app) are several components that expose | |||||
vlorentz: Missing a closing parenthesis. | |||||
different aspects of the |swh| :term:`archive` as internal REST APIs. | |||||
Each of these internal APIs have a dedicated (Postgresql) database. | |||||
Not Done Inline ActionsPostgresql vlorentz: Postgresql | |||||
A global view of this architecture looks like: | |||||
.. figure:: images/general-architecture.* | |||||
General view of the |swh| architecture. | |||||
The front API components are: | |||||
- :ref:`Storage API <swh-storage>` | |||||
- :ref:`Deposit API <swh-deposit>` | |||||
- :ref:`Vault API <swh-vault>` | |||||
- :ref:`Indexer API <swh-indexer>` | |||||
- :ref:`Scheduler API <swh-scheduler>` | |||||
On the back stage of this show, a celery_ based game of tasks and workers | |||||
occurs to perform all the required work to fill, maintain and update the |swh| | |||||
:term:`archive`. | |||||
The main components involved in this choreography are: | |||||
Not Done Inline Actionschoreography vlorentz: choreography | |||||
- :term:`Listers <lister>`: a lister is a type of task aiming at scrapping a | |||||
web site, a forge, etc. to gather all the source code repositories it can | |||||
find. For each found source code repository, a :term:`loader` task is | |||||
created. | |||||
- :term:`Loaders <loader>`: a loader is a type of task aiming at importing or | |||||
updating a source code repository. It is the one that inserts :term:`blob` | |||||
objects in the :term:`object storage`, and inserts nodes and edges in the | |||||
Not Done Inline Actionsand inserts nodes and edges in the graph. vlorentz: and inserts nodes and edges in the graph. | |||||
:ref:`graph <swh-merkle-dag>`. | |||||
- :term:`Indexers <indexer>`: an indexer is a type of task aiming at crawling | |||||
Not Done Inline Actionsinformation (it's uncountable) vlorentz: information (it's uncountable) | |||||
the content of the :term:`archive` to extract derived information (mimetype, | |||||
etc.) | |||||
Tasks | |||||
----- | |||||
The following sequence diagram shows the interactions between these components | |||||
when a new forge needs to be archived. This example depicts the case of a | |||||
gitlab_ forge, but any other supported source type would be very similar. | |||||
.. mermaid:: tasks-lister.mmd | |||||
As one might observe in this diagram, it does create two things: | |||||
- it adds one :term:`origin` objects in the :term:`storage` database for each | |||||
source code repository, and | |||||
- it insert one :term:`loader` task for each source code repository that will | |||||
be in charge of importing the content of that repository. | |||||
Not Done Inline Actions"seauence" vlorentz: "seauence" | |||||
The sequence diagram below describe this second step of importing the content | |||||
of a repository. Once again, we take the example of a git repository, but any | |||||
other type of repository would be very similar. | |||||
.. mermaid:: tasks-git-loader.mmd | |||||
.. _celery: https://www.celeryproject.org | |||||
Not Done Inline Actions"gitlba" vlorentz: "gitlba" | |||||
.. _gitlab: https://gitlab.com |
Missing a closing parenthesis.