.. _swh-core-db:

Common database utilities
=========================

The ``swh.core.db`` module offers a set of common (postgresql) database
handling utilities and features for other swh packages implementing a
`datastore`, aka a service responsible for providing a data store via a common
interface which can use a postgresql database as backend. Examples are
:mod:`swh.storage` or :mod:`swh.scheduler`.

Most of the time, this database-based data storage facility will depend on a data
schema (may be based in :mod:`swh.model` or not) and provide a unified interface
based on an Python class to abstract access to this datastore.

Some packages may implement only a postgresql backend, some may provide more
backends.

This :mod:`swh.core.db` only deals with the postgresql part and provides common
features and tooling to manage the database lifecycle in a consistent and
unified way among all the :mod:`swh` packages.

It comes with a few command line tools to manage the specific :mod:`swh`
package database.

As such, most of the database management cli commands require a configuration
file holding the database connection information.

For example, for the :mod:`swh.storage` package, one will be able to create,
initialize and upgrade the postgresql database using simple commands.

To create the database and perform superuser initialization steps (see below):

.. code-block:: bash

   $ swh db create storage --dbname=postgresql://superuser:passwd@localhost:5433/test-storage

If the database already exists but lacks superuser level initialization steps,
you may use:

.. code-block:: bash

   $ swh db init-admin storage --dbname=postgresql://superuser:passwd@localhost:5433/test-storage


Then assuming the ``config.yml`` file existence:

.. code-block:: yaml

   storage:
     cls: postgresql
	 db: host=localhost, port=5433, dbname=test-storage, username=normal-user, password=pwd
	 objstorage:
	   cls: memory

then you can run:

.. code-block:: bash

   $ swh db --config-file=config.yml init storage
   DONE database for storage initialized (flavor default) at version 182

Note: you can define the ``SWH_CONFIG_FILENAME`` environment variable instead
of using the ``--config-name`` command line option.

or check the actual data model version of this database:

.. code-block:: bash

   $ swh db --config-file=config.yml version storage
   module: storage
   flavor: default
   version: 182

as well as the migration history for the database:

.. code-block:: bash

   $ swh db --config-file=config.yml version --all storage
   module: storage
   flavor: default
   182 [2022-02-11 15:08:31.806070+01:00] Work In Progress
   181 [2022-02-11 14:06:27.435010+01:00] Work In Progress


The database migration is done using the ``swh db upgrade`` command.



Implementation of a swh.core.db datastore
-----------------------------------------

To use this database management tooling, in a :mod:`swh` package, the following
conditions are expected:

- the package should provide an ``sql`` directory in its root namespace
  providing initialization sql scripts. Scripts should be named like
  ``nn-xxx.sql`` and are executed in order according to the ``nn`` integer
  value. Scripts having ``-superuser-`` in their name will be executed by the
  ``init-admin`` tool and are expected to require superuser access level,
  whereas scripts without ``-superuser-`` in their name will be executed by the
  ``swh db init`` command and are expected to require write access
  level (with no need for superuser access level).

- the package should provide a ``sql/upgrade`` directory with SQL migration
  scripts in its root namespace. Script names are expected to be of the form
  ``nnn.sql`` where `nnn` is the version to which this script does the
  migration from a database at version `nnn - 1`.

- the initialization and migration scripts should not create nor fill the
  metadata related tables (``dbversion`` and ``dbmodule``).

- the package should provide a ``get_datastore`` function in its root namespace
  returning an instance of the datastore object. Normally, this datastore
  object uses ``swh.core.db.BaseDb`` to interact with the actual database.

- The datastore object should provide a ``get_current_version()`` method
  returning the database version expected by the code.

See existing ``swh`` packages like ``swh.storage`` or ``swh.scheduler`` for
usage examples.

Writing tests
-------------

The ``swh.core.db.pytest_plugin`` provides a few helper tools to write unit
tests for postgresql based datastores.

By default, when using these fixtures, a posgresql server will be started (by
the pytest_postgresql fixture) and a template database will be created using
the ``postgresql_proc`` fixture factory provided by ``pytest_postgresql``.

Then a dedicated fixture must be declared to use the ``postgresql_proc``
fixture generated by the fixture factory function.

This template database will then be used to create a new database for test
using this dedicated fixture.

In order to help the database initialization process and make it consistent
with the database initialization tools from the ``swh db`` cli, an
``initialize_database_for_module()`` function is provided to be used with the
fixture factory described above.

Typically, writing tests for a ``swh`` package ``swh.example`` would look like:

.. code-block:: python

   from functools import partial

   from pytest_postgresql import factories
   from swh.core.db.pytest_plugin import postgresql_fact
   from swh.core.db.pytest_plugin import initialize_database_for_module

   example_postgresql_proc = factories.postgresql_proc(
     dbname="example",
     load=[partial(initialize_database_for_module,
                   modname="example", version=1)]
     )

   postgresql_example = postgresql_fact("example_postgresql_proc")

   def test_example(postgresql_example):
       with postgresql_example.cursor() as c:
           c.execute("select version from dbversion limit 1")
           assert c.fecthone()[0] == 1


Note: most of the time, you will want to put the scaffolding part of the code
above in a ``conftest.py`` file.


The ``load`` argument of the ``factories.postgresql_proc`` will be used to
initialize the template database that will be used to create a new database for
each test, while the ``load`` argument of the ``postgresql_fact`` fixture will
be executed before each test (in the database created from the template
database and dedicated to the test being executed).
