Page MenuHomeSoftware Heritage

D7159.id26063.diff
No OneTemporary

D7159.id26063.diff

diff --git a/docs/db.rst b/docs/db.rst
new file mode 100644
--- /dev/null
+++ b/docs/db.rst
@@ -0,0 +1,176 @@
+.. _swh-core-db:
+
+Common database utilities
+=========================
+
+The ``swh.core.db`` module offers a set of common (postgresql) database
+handling utilities and features for other swh packages implementing a
+`datastore`, aka a service responsible for providing a data store via a common
+interface which can use a postgresql database as backend. Examples are
+:mod:`swh.storage` or :mod:`swh.scheduler`.
+
+Most of the time, this database-based data storage facility will depend on a data
+schema (may be based in :mod:`swh.model` or not) and provide a unified interface
+based on an Python class to abstract access to this datastore.
+
+Some packages may implement only a postgresql backend, some may provide more
+backends.
+
+This :mod:`swh.core.db` only deals with the postgresql part and provides common
+features and tooling to manage the database lifecycle in a consistent and
+unified way among all the :mod:`swh` packages.
+
+It comes with a few command line tools to manage the specific :mod:`swh`
+package database.
+
+As such, most of the database management cli commands require a configuration
+file holding the database connection information.
+
+For example, for the :mod:`swh.storage` package, one will be able to create,
+initialize and upgrade the postgresql database using simple commands.
+
+To create the database and perform superuser initialization steps (see below):
+
+.. code-block:: bash
+
+ $ swh db create storage --dbname=postgresql://superuser:passwd@localhost:5433/test-storage
+
+If the database already exists but lacks superuser level initialization steps,
+you may use:
+
+.. code-block:: bash
+
+ $ swh db init-admin storage --dbname=postgresql://superuser:passwd@localhost:5433/test-storage
+
+
+Then assuming the ``config.yml`` file existence:
+
+.. code-block:: yaml
+
+ storage:
+ cls: postgresql
+ db: host=localhost, port=5433, dbname=test-storage, username=normal-user, password=pwd
+ objstorage:
+ cls: memory
+
+then you can run:
+
+.. code-block:: bash
+
+ $ swh db --config-file=config.yml init storage
+ DONE database for storage initialized (flavor default) at version 182
+
+Note: you can define the ``SWH_CONFIG_FILENAME`` environment variable instead
+of using the ``--config-name`` command line option.
+
+or check the actual data model version of this database:
+
+.. code-block:: bash
+
+ $ swh db --config-file=config.yml version storage
+ module: storage
+ flavor: default
+ version: 182
+
+as well as the migration history for the database:
+
+.. code-block:: bash
+
+ $ swh db --config-file=config.yml version --all storage
+ module: storage
+ flavor: default
+ 182 [2022-02-11 15:08:31.806070+01:00] Work In Progress
+ 181 [2022-02-11 14:06:27.435010+01:00] Work In Progress
+
+
+The database migration is done using the ``swh db upgrade`` command.
+
+
+
+Implementation of a swh.core.db datastore
+-----------------------------------------
+
+To use this database management tooling, in a :mod:`swh` package, the following
+conditions are expected:
+
+- the package should provide an ``sql`` directory in its root namespace
+ providing initialization sql scripts. Scripts should be named like
+ ``nn-xxx.sql`` and are executed in order according to the ``nn`` integer
+ value. Scripts having ``-superuser-`` in their name will be executed by the
+ ``init-admin`` tool and are expected to require superuser access level,
+ whereas scripts without ``-superuser-`` in their name will be executed by the
+ ``swh db init`` command and are expected to require write access
+ level (with no need for superuser access level).
+
+- the package should provide a ``sql/upgrade`` directory with SQL migration
+ scripts in its root namespace. Script names are expected to be of the form
+ ``nnn.sql`` where `nnn` is the version to which this script does the
+ migration from a database at version `nnn - 1`.
+
+- the initialization and migration scripts should not create nor fill the
+ metadata related tables (``dbversion`` and ``dbmodule``).
+
+- the package should provide a ``get_datastore`` function in its root namespace
+ returning an instance of the datastore object. Normally, this datastore
+ object uses ``swh.core.db.BaseDb`` to interact with the actual database.
+
+- The datastore object should provide a ``get_current_version()`` method
+ returning the database version expected by the code.
+
+See existing ``swh`` packages like ``swh.storage`` or ``swh.scheduler`` for
+usage examples.
+
+Writing tests
+-------------
+
+The ``swh.core.db.pytest_plugin`` provides a few helper tools to write unit
+tests for postgresql based datastores.
+
+By default, when using these fixtures, a posgresql server will be started (by
+the pytest_postgresql fixture) and a template database will be created using
+the ``postgresql_proc`` fixture factory provided by ``pytest_postgresql``.
+
+Then a dedicated fixture must be declared to use the ``postgresql_proc``
+fixture generated by the fixture factory function.
+
+This template database will then be used to create a new database for test
+using this dedicated fixture.
+
+In order to help the database initialization process and make it consistent
+with the database initialization tools from the ``swh db`` cli, an
+``initialize_database_for_module()`` function is provided to be used with the
+fixture factory described above.
+
+Typically, writing tests for a ``swh`` package ``swh.example`` would look like:
+
+.. code-block:: python
+
+ from functools import partial
+
+ from pytest_postgresql import factories
+ from swh.core.db.pytest_plugin import postgresql_fact
+ from swh.core.db.pytest_plugin import initialize_database_for_module
+
+ example_postgresql_proc = factories.postgresql_proc(
+ dbname="example",
+ load=[partial(initialize_database_for_module,
+ modname="example", version=1)]
+ )
+
+ postgresql_example = postgresql_fact("example_postgresql_proc")
+
+ def test_example(postgresql_example):
+ with postgresql_example.cursor() as c:
+ c.execute("select version from dbversion limit 1")
+ assert c.fecthone()[0] == 1
+
+
+Note: most of the time, you will want to put the scaffolding part of the code
+above in a ``conftest.py`` file.
+
+
+The ``load`` argument of the ``factories.postgresql_proc`` will be used to
+initialize the template database that will be used to create a new database for
+each test, while the ``load`` argument of the ``postgresql_fact`` fixture will
+be executed before each test (in the database created from the template
+database and dedicated to the test being executed).
diff --git a/docs/index.rst b/docs/index.rst
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -9,4 +9,5 @@
:maxdepth: 2
cli
+ db
/apidoc/swh.core

File Metadata

Mime Type
text/plain
Expires
Wed, Dec 18, 6:50 AM (1 d, 19 h ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3220011

Event Timeline