Page MenuHomeSoftware Heritage

db.rst
No OneTemporary

.. _swh-core-db:
Common database utilities
=========================
The ``swh.core.db`` module offers a set of common (postgresql) database
handling utilities and features for other swh packages implementing a
`datastore`, aka a service responsible for providing a data store via a common
interface which can use a postgresql database as backend. Examples are
:mod:`swh.storage` or :mod:`swh.scheduler`.
Most of the time, this database-based data storage facility will depend on a data
schema (may be based in :mod:`swh.model` or not) and provide a unified interface
based on an Python class to abstract access to this datastore.
Some packages may implement only a postgresql backend, some may provide more
backends.
This :mod:`swh.core.db` only deals with the postgresql part and provides common
features and tooling to manage the database lifecycle in a consistent and
unified way among all the :mod:`swh` packages.
It comes with a few command line tools to manage the specific :mod:`swh`
package database.
As such, most of the database management cli commands require a configuration
file holding the database connection information.
For example, for the :mod:`swh.storage` package, one will be able to create,
initialize and upgrade the postgresql database using simple commands.
To create the database and perform superuser initialization steps (see below):
.. code-block:: bash
$ swh db create storage --dbname=postgresql://superuser:passwd@localhost:5433/test-storage
If the database already exists but lacks superuser level initialization steps,
you may use:
.. code-block:: bash
$ swh db init-admin storage --dbname=postgresql://superuser:passwd@localhost:5433/test-storage
Then assuming the ``config.yml`` file existence:
.. code-block:: yaml
storage:
cls: postgresql
db: host=localhost, port=5433, dbname=test-storage, username=normal-user, password=pwd
objstorage:
cls: memory
then you can run:
.. code-block:: bash
$ swh db --config-file=config.yml init storage
DONE database for storage initialized (flavor default) at version 182
Note: you can define the ``SWH_CONFIG_FILENAME`` environment variable instead
of using the ``--config-name`` command line option.
or check the actual data model version of this database:
.. code-block:: bash
$ swh db --config-file=config.yml version storage
module: storage
flavor: default
version: 182
as well as the migration history for the database:
.. code-block:: bash
$ swh db --config-file=config.yml version --all storage
module: storage
flavor: default
182 [2022-02-11 15:08:31.806070+01:00] Work In Progress
181 [2022-02-11 14:06:27.435010+01:00] Work In Progress
The database migration is done using the ``swh db upgrade`` command.
Implementation of a swh.core.db datastore
-----------------------------------------
To use this database management tooling, in a :mod:`swh` package, the following
conditions are expected:
- the package should provide an ``sql`` directory in its root namespace
providing initialization sql scripts. Scripts should be named like
``nn-xxx.sql`` and are executed in order according to the ``nn`` integer
value. Scripts having ``-superuser-`` in their name will be executed by the
``init-admin`` tool and are expected to require superuser access level,
whereas scripts without ``-superuser-`` in their name will be executed by the
``swh db init`` command and are expected to require write access
level (with no need for superuser access level).
- the package should provide a ``sql/upgrade`` directory with SQL migration
scripts in its root namespace. Script names are expected to be of the form
``nnn.sql`` where `nnn` is the version to which this script does the
migration from a database at version `nnn - 1`.
- the initialization and migration scripts should not create nor fill the
metadata related tables (``dbversion`` and ``dbmodule``).
- the package should provide a ``get_datastore`` function in its root namespace
returning an instance of the datastore object. Normally, this datastore
object uses ``swh.core.db.BaseDb`` to interact with the actual database.
- The datastore object should provide a ``get_current_version()`` method
returning the database version expected by the code.
See existing ``swh`` packages like ``swh.storage`` or ``swh.scheduler`` for
usage examples.
Writing tests
-------------
The ``swh.core.db.pytest_plugin`` provides a few helper tools to write unit
tests for postgresql based datastores.
By default, when using these fixtures, a posgresql server will be started (by
the pytest_postgresql fixture) and a template database will be created using
the ``postgresql_proc`` fixture factory provided by ``pytest_postgresql``.
Then a dedicated fixture must be declared to use the ``postgresql_proc``
fixture generated by the fixture factory function.
This template database will then be used to create a new database for test
using this dedicated fixture.
In order to help the database initialization process and make it consistent
with the database initialization tools from the ``swh db`` cli, an
``initialize_database_for_module()`` function is provided to be used with the
fixture factory described above.
Typically, writing tests for a ``swh`` package ``swh.example`` would look like:
.. code-block:: python
from functools import partial
from pytest_postgresql import factories
from swh.core.db.pytest_plugin import postgresql_fact
from swh.core.db.pytest_plugin import initialize_database_for_module
example_postgresql_proc = factories.postgresql_proc(
dbname="example",
load=[partial(initialize_database_for_module,
modname="example", version=1)]
)
postgresql_example = postgresql_fact("example_postgresql_proc")
def test_example(postgresql_example):
with postgresql_example.cursor() as c:
c.execute("select version from dbversion limit 1")
assert c.fecthone()[0] == 1
Note: most of the time, you will want to put the scaffolding part of the code
above in a ``conftest.py`` file.
The ``load`` argument of the ``factories.postgresql_proc`` will be used to
initialize the template database that will be used to create a new database for
each test, while the ``load`` argument of the ``postgresql_fact`` fixture will
be executed before each test (in the database created from the template
database and dedicated to the test being executed).

File Metadata

Mime Type
text/x-python
Expires
Jul 4 2025, 9:31 AM (5 w, 5 d ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3365979

Event Timeline