diff --git a/docs/db.rst b/docs/db.rst new file mode 100644 --- /dev/null +++ b/docs/db.rst @@ -0,0 +1,176 @@ +.. _swh-core-db: + +Common database utilities +========================= + +The ``swh.core.db`` module offers a set of common (postgresql) database +handling utilities and features for other swh packages implementing a +`datastore`, aka a service responsible for providing a data store via a common +interface which can use a postgresql database as backend. Examples are +:mod:`swh.storage` or :mod:`swh.scheduler`. + +Most of the time, this database-based data storage facility will depend on a data +schema (may be based in :mod:`swh.model` or not) and provide a unified interface +based on an Python class to abstract access to this datastore. + +Some packages may implement only a postgresql backend, some may provide more +backends. + +This :mod:`swh.core.db` only deals with the postgresql part and provides common +features and tooling to manage the database lifecycle in a consistent and +unified way among all the :mod:`swh` packages. + +It comes with a few command line tools to manage the specific :mod:`swh` +package database. + +As such, most of the database management cli commands require a configuration +file holding the database connection information. + +For example, for the :mod:`swh.storage` package, one will be able to create, +initialize and upgrade the postgresql database using simple commands. + +To create the database and perform superuser initialization steps (see below): + +.. code-block:: bash + + $ swh db create storage --dbname=postgresql://superuser:passwd@localhost:5433/test-storage + +If the database already exists but lacks superuser level initialization steps, +you may use: + +.. code-block:: bash + + $ swh db init-admin storage --dbname=postgresql://superuser:passwd@localhost:5433/test-storage + + +Then assuming the ``config.yml`` file existence: + +.. code-block:: yaml + + storage: + cls: postgresql + db: host=localhost, port=5433, dbname=test-storage, username=normal-user, password=pwd + objstorage: + cls: memory + +then you can run: + +.. code-block:: bash + + $ swh db --config-file=config.yml init storage + DONE database for storage initialized (flavor default) at version 182 + +Note: you can define the ``SWH_CONFIG_FILENAME`` environment variable instead +of using the ``--config-name`` command line option. + +or check the actual data model version of this database: + +.. code-block:: bash + + $ swh db --config-file=config.yml version storage + module: storage + flavor: default + version: 182 + +as well as the migration history for the database: + +.. code-block:: bash + + $ swh db --config-file=config.yml version --all storage + module: storage + flavor: default + 182 [2022-02-11 15:08:31.806070+01:00] Work In Progress + 181 [2022-02-11 14:06:27.435010+01:00] Work In Progress + + +The database migration is done using the ``swh db upgrade`` command. + + + +Implementation of a swh.core.db datastore +----------------------------------------- + +To use this database management tooling, in a :mod:`swh` package, the following +conditions are expected: + +- the package should provide an ``sql`` directory in its root namespace + providing initialization sql scripts. Scripts should be named like + ``nn-xxx.sql`` and are executed in order according to the ``nn`` integer + value. Scripts having ``-superuser-`` in their name will be executed by the + ``init-admin`` tool and are expected to require superuser access level, + whereas scripts without ``-superuser-`` in their name will be executed by the + ``swh db init`` command and are expected to require write access + level (with no need for superuser access level). + +- the package should provide a ``sql/upgrade`` directory with SQL migration + scripts in its root namespace. Script names are expected to be of the form + ``nnn.sql`` where `nnn` is the version to which this script does the + migration from a database at version `nnn - 1`. + +- the initialization and migration scripts should not create nor fill the + metadata related tables (``dbversion`` and ``dbmodule``). + +- the package should provide a ``get_datastore`` function in its root namespace + returning an instance of the datastore object. Normally, this datastore + object uses ``swh.core.db.BaseDb`` to interact with the actual database. + +- The datastore object should provide a ``get_current_version()`` method + returning the database version expected by the code. + +See existing ``swh`` packages like ``swh.storage`` or ``swh.scheduler`` for +usage examples. + +Writing tests +------------- + +The ``swh.core.db.pytest_plugin`` provides a few helper tools to write unit +tests for postgresql based datastores. + +By default, when using these fixtures, a posgresql server will be started (by +the pytest_postgresql fixture) and a template database will be created using +the ``postgresql_proc`` fixture factory provided by ``pytest_postgresql``. + +Then a dedicated fixture must be declared to use the ``postgresql_proc`` +fixture generated by the fixture factory function. + +This template database will then be used to create a new database for test +using this dedicated fixture. + +In order to help the database initialization process and make it consistent +with the database initialization tools from the ``swh db`` cli, an +``initialize_database_for_module()`` function is provided to be used with the +fixture factory described above. + +Typically, writing tests for a ``swh`` package ``swh.example`` would look like: + +.. code-block:: python + + from functools import partial + + from pytest_postgresql import factories + from swh.core.db.pytest_plugin import postgresql_fact + from swh.core.db.pytest_plugin import initialize_database_for_module + + example_postgresql_proc = factories.postgresql_proc( + dbname="example", + load=[partial(initialize_database_for_module, + modname="example", version=1)] + ) + + postgresql_example = postgresql_fact("example_postgresql_proc") + + def test_example(postgresql_example): + with postgresql_example.cursor() as c: + c.execute("select version from dbversion limit 1") + assert c.fecthone()[0] == 1 + + +Note: most of the time, you will want to put the scaffolding part of the code +above in a ``conftest.py`` file. + + +The ``load`` argument of the ``factories.postgresql_proc`` will be used to +initialize the template database that will be used to create a new database for +each test, while the ``load`` argument of the ``postgresql_fact`` fixture will +be executed before each test (in the database created from the template +database and dedicated to the test being executed). diff --git a/docs/index.rst b/docs/index.rst --- a/docs/index.rst +++ b/docs/index.rst @@ -9,4 +9,5 @@ :maxdepth: 2 cli + db /apidoc/swh.core