diff --git a/docs/db.rst b/docs/db.rst new file mode 100644 --- /dev/null +++ b/docs/db.rst @@ -0,0 +1,181 @@ +.. _swh-core-db: + +Common database utilities +========================= + +The ``swh.core.db`` module offers a set of common (postgresql) database +handling utilisies and features for other swh packages implementing a +`datastore`, aka a service responsible for providing a data store via a common +interface which can use a postgresql database as backend. Examples are +``swh.storage`` or ``swh.scheduler``. + +Most of time, this database based data storage facility will depend on a data +schema (may be based in ``swh.model`` or not) and provide a unified interface +based on an Python class to abstract access this datastore. + +Some packages may implement only a postgresql backend, some may provide more +backends. + +This ``swh.core.db`` only deals with the postgresql part and provides common +features and tooling to manage the database lifecycle in a concistent and +unified way among all the ``swh`` packages. + +It comes with a few command line tools to manage the database for a specific +``swh`` package. + +As such, most of the database management cli commands require a configuration +file is expected to be used to provide database connection information. + +For example, for the ``swh.storage`` package, one will be able to create, +initialize and upgrade the postgresql database using simple commands. + +To create the database and perform superuser initialization steps (see below): + +.. code-block:: bash + + $ swh db create storage --dbname=postgresql://superuser:passwd@localhost:5433/test-storage + +If the database already exists but lack superuser level initialization steps, +you may use: + +.. code-block:: bash + + $ swh db init-admin storage --dbname=postgresql://superuser:passwd@localhost:5433/test-storage + + +Then assuming the existence of a ``config.yml`` file like: + +.. code-block:: yaml + + storage: + cls: postgresql + db: + host: localhost + port: 5433 + dbname: test-storage + username: normal-user + password: some-password + objstorage: + cls: memory + +then you can run: + +.. code-block:: bash + + $ swh db --config-file=config.yml init storage + DONE database for storage initialized (flavor default) at version 182 + +Note: you can define the ``SWH_CONFIG_FILENAME`` environment variable instead +of using the ``--config-name`` command line option. + +or check the actual data model version of this database: + +.. code-block:: bash + + $ swh db --config-file=config.yml version storage + module: storage + flavor: default + version: 182 + +as well as the migration history for the database: + +.. code-block:: bash + + $ swh db --config-file=config.yml version --all storage + module: storage + flavor: default + 182 [2022-02-11 15:08:31.806070+01:00] Work In Progress + 181 [2022-02-11 14:06:27.435010+01:00] Work In Progress + + +The migration of the database can be done using ``swh db upgrade`` command. + + + +Implementation of a swh.core.db datastore +----------------------------------------- + +To use this database management tooling, in a ``swh`` package, the following +conditions are expected: + +- the package should provide an ``sql`` directory in its root namespace + providing initialization sql scripts. Scripts should be named like + ``nn-xxx.sql`` and are executed in order according to the ``nn`` integer + value. Scripts having ``-superuser-`` in their name will be executed by the + ``init-admin`` tool and are expected to requite superuser access level, + whereas scripts without ``-superuser-`` in their name will be executed by the + ``swh db init`` command and are not expected to requires superuser access + level. + +- the package should provide a ``sql/upgrade`` directory with SQL migration + scripts in its root namespace. Script names are expected to be of the form + ``nnn.sql`` where `nnn` is the version to which this script does the + migration from a database at version `nnn-1`. + +- the initialization and migration scripts should not create nor fill the + metadata related tables (``dbversion`` and ``dbmodule``). + +- the package should provide a ``get_datastore`` function in its root namespace + returning an instance of the datastore object. Normally, this datastore + object uses ``swh.core.db.BaseDb`` to interact with the actual database. + +- The datastore object should provide a ``get_current_version()`` method + retuning the version of the database expected by the code. + +See existing ``swh`` packages like ``swh.storage`` or ``swh.scheduler`` for +usage examples. + +Writing tests +------------- + +The ``swh.core.db.pytest_plugin`` provide a few helper tools to write unit +tests for postgresql based datastores. + +By default, when using these fixtures, a posgresql server will be started (by +the pytest_postgresql fixture) and a template database will be created using +the ``postgresql_proc`` fixture factory provided by ``pytest_postgresql``. + +Then a dedicated fixture must be declared to use the ``postgresql_proc`` +fixture generated by the fixture factory function. + +This template database will then be used to create a new database for test +using this dedicated fixture. + +In order to help the database initialization process and make it consistent +with the database initialization tools from the ``swh db`` cli, an +``initialize_database_for_module()`` function is provided to be used with the +fixture factory described above. + +Typically, writing tests for a ``swh`` package ``swh.example`` would look like: + +.. code-block:: python + + from functools import partial + + from pytest_postgresql import factories + from swh.core.db.pytest_plugin import postgresql_fact + from swh.core.db.pytest_plugin import initialize_database_for_module + + example_postgresql_proc = factories.postgresql_proc( + dbname="example", + load=[partial(initialize_database_for_module, + modname="example", version=1)] + ) + + postgresql_example = postgresql_fact("example_postgresql_proc") + + def test_example(postgresql_example): + with postgresql_example.cursor() as c: + c.execute("select version from dbversion limit 1") + assert c.fecthone()[0] == 1 + + +Note: most of the time, you will want to put the scaffolding part of the code +above in a ``conftest.py`` file. + + +The ``load`` argument of the ``factories.postgresql_proc`` will be used to +initialize the template database that will be used to create a new database for +each test, while the ``load`` argument of the ``postgresql_fact`` fixture will +be executed before each test (in the database created from the template +database and dedicated to the test being executed). diff --git a/docs/index.rst b/docs/index.rst --- a/docs/index.rst +++ b/docs/index.rst @@ -9,4 +9,5 @@ :maxdepth: 2 cli + db /apidoc/swh.core