Page MenuHomeSoftware Heritage

Add support for dbversion and dbmodule handling in swh db init
ClosedPublic

Authored by douardda on Feb 2 2022, 1:51 PM.

Details

Summary

for now, managing the dbversion table (storing the db schema upgrade
history) is partially the responsibility of the swh modules implementing
a datastore (swh-storage etc.), and the actual swh module was not stored
in the database.

This adds support for both the dbversion and dbmodule tables management
in swh.core.db.

The swh db init command now creates both these tables and initialise
their values if possible.
For this, it introduces a new common "API" for swh.core.db based datastores:

  • a swh module implementing a datastore is expected to have a get_datastore function in its top namespace; this function is typically the actual get_storage, get_scheduler etc. functions
  • this factory function is expected to use the "postgresql" cls for the datastore based on swh.core.db,
  • the datastore instance (eg. instance of the swh.storage.postgresql.Storage class) is expected to have a get_current_version() method (returning an int).

This revision also provides a new swh db version command displaying
the current (code) version (if avalable), the db version (possibly the whole
version history) and flavor (if any) for the datastore given as argument.

It should be bacward compatible with eisting swh datastore modules (i.e.
not adapter to this new API).

Depends on D7154
Related to T3894

Event Timeline

Build has FAILED

Patch application report for D7062 (id=25612)

Could not rebase; Attempt merge onto aba5c80765...

Updating aba5c80..886c8c5
Fast-forward
 MANIFEST.in                                        |   3 +-
 swh/core/cli/db.py                                 | 222 ++++++----------
 swh/core/db/db_utils.py                            | 293 ++++++++++++++++++++-
 swh/core/db/sql/35-dbmetadata.sql                  |  28 ++
 swh/core/db/tests/conftest.py                      |  40 +++
 .../tests/data/cli/{1-schema.sql => 30-schema.sql} |   0
 .../db/tests/data/cli/{3-func.sql => 40-funcs.sql} |   0
 .../db/tests/data/cli/{4-data.sql => 50-data.sql}  |   0
 swh/core/db/tests/test_cli.py                      |  34 +--
 swh/core/db/tests/test_db_utils.py                 |  62 +++++
 swh/core/sql/log-schema.sql                        |  33 ---
 11 files changed, 503 insertions(+), 212 deletions(-)
 create mode 100644 swh/core/db/sql/35-dbmetadata.sql
 rename swh/core/db/tests/data/cli/{1-schema.sql => 30-schema.sql} (100%)
 rename swh/core/db/tests/data/cli/{3-func.sql => 40-funcs.sql} (100%)
 rename swh/core/db/tests/data/cli/{4-data.sql => 50-data.sql} (100%)
 create mode 100644 swh/core/db/tests/test_db_utils.py
 delete mode 100644 swh/core/sql/log-schema.sql
Changes applied before test
commit 886c8c545fa424db720b4ec3cd1c8ec2de4e3952
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 15:02:51 2022 +0100

    Add support for dbversion and dbmodule handling in swh db init
    
    for now, managing the dbversion table (storing the db schema upgrade
    history) is partially the responsibility of the swh modules implementing
    a datastore (swh-storage etc.), and the actual swh module was not stored
    in the database.
    
    This adds support for both the dbversion and dbmodule tables management
    in swh.core.db.
    
    The `swh db init` command now creates both these tables and initialise
    their values if possible.
    For this, it introduces a new common "API" for swh.core.db based datastores:
    
    - a swh module implementing a datastore is expected to have a
      `get_datastore` function in its top namespace; this function is
      typically the actual `get_storage`, `get_scheduler` etc. functions
    - this factory function is expected to use the "postgresql" cls for the
      datastore based on swh.core.db,
    - the datastore instance (eg. instance of the swh.storage.postgresql.Storage
      class) is expected to have a `get_current_version()` method (returning
      an int).
    
    This revision also provides a new `swh db version` command displaying
    the current (code) version (if avalable), the db version (possibly the whole
    version history) and flavor (if any) for the datastore given as argument.
    
    It should be bacward compatible with eisting swh datastore modules (i.e.
    not adapter to this new API).

commit cbc634d04ee9e57cdc42ebab2e087e1eac0031b5
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 16:00:43 2022 +0100

    Refactor the mock_package_sql fixture to allow multiple sql script set
    
    will be used to implement several test scenarios in following commits.
    Also rename existing sql scripts with a "legit" numbering scheme.

commit 4f41cc905d805953363688316a088bc84100e181
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 14:54:39 2022 +0100

    Make `swh db init` retrieve the db cnx uri from the config file

commit 6c665e38e6579a02839eef6a8a6b60dc332be51c
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 14:49:41 2022 +0100

    Move utility functions from cli.db to db.db_utils

commit e4117fcf51a256c5b708f909c818f452c794c0f8
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 10:07:35 2022 +0100

    Remove the old and unused log-schema.sql file

Link to build: https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/267/
See console output for more information: https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/267/console

Harbormaster returned this revision to the author for changes because remote builds failed.Feb 2 2022, 1:53 PM
Harbormaster failed remote builds in B26442: Diff 25612!

Build has FAILED

Patch application report for D7062 (id=25619)

Could not rebase; Attempt merge onto aba5c80765...

Updating aba5c80..54c53db
Fast-forward
 MANIFEST.in                                        |   3 +-
 swh/core/cli/db.py                                 | 222 ++++++----------
 swh/core/db/db_utils.py                            | 295 ++++++++++++++++++++-
 swh/core/db/sql/35-dbmetadata.sql                  |  28 ++
 swh/core/db/tests/conftest.py                      |  40 +++
 .../tests/data/cli/{1-schema.sql => 30-schema.sql} |   0
 .../db/tests/data/cli/{3-func.sql => 40-funcs.sql} |   0
 .../db/tests/data/cli/{4-data.sql => 50-data.sql}  |   0
 swh/core/db/tests/test_cli.py                      |  34 +--
 swh/core/db/tests/test_db_utils.py                 |  74 ++++++
 swh/core/sql/log-schema.sql                        |  33 ---
 11 files changed, 516 insertions(+), 213 deletions(-)
 create mode 100644 swh/core/db/sql/35-dbmetadata.sql
 rename swh/core/db/tests/data/cli/{1-schema.sql => 30-schema.sql} (100%)
 rename swh/core/db/tests/data/cli/{3-func.sql => 40-funcs.sql} (100%)
 rename swh/core/db/tests/data/cli/{4-data.sql => 50-data.sql} (100%)
 create mode 100644 swh/core/db/tests/test_db_utils.py
 delete mode 100644 swh/core/sql/log-schema.sql
Changes applied before test
commit 54c53dbdc2574e5f8d9f699505a3217708bf8b54
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 15:02:51 2022 +0100

    Add support for dbversion and dbmodule handling in swh db init
    
    for now, managing the dbversion table (storing the db schema upgrade
    history) is partially the responsibility of the swh modules implementing
    a datastore (swh-storage etc.), and the actual swh module was not stored
    in the database.
    
    This adds support for both the dbversion and dbmodule tables management
    in swh.core.db.
    
    The `swh db init` command now creates both these tables and initialise
    their values if possible.
    For this, it introduces a new common "API" for swh.core.db based datastores:
    
    - a swh module implementing a datastore is expected to have a
      `get_datastore` function in its top namespace; this function is
      typically the actual `get_storage`, `get_scheduler` etc. functions
    - this factory function is expected to use the "postgresql" cls for the
      datastore based on swh.core.db,
    - the datastore instance (eg. instance of the swh.storage.postgresql.Storage
      class) is expected to have a `get_current_version()` method (returning
      an int).
    
    This revision also provides a new `swh db version` command displaying
    the current (code) version (if avalable), the db version (possibly the whole
    version history) and flavor (if any) for the datastore given as argument.
    
    It should be bacward compatible with eisting swh datastore modules (i.e.
    not adapter to this new API).

commit c75374bafd77dfff82904985b935bbd035147cbc
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 16:00:43 2022 +0100

    Refactor the mock_package_sql fixture to allow multiple sql script set
    
    will be used to implement several test scenarios in following commits.
    Also rename existing sql scripts with a "legit" numbering scheme.

commit 5c5b981020c4b3e38f89f112480b7297cceded5c
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 14:54:39 2022 +0100

    Make `swh db init` retrieve the db cnx uri from the config file

commit d3b7b081fd3eda460d11d1d6d70f2b52fa3e0ef3
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 14:49:41 2022 +0100

    Move utility functions from cli.db to db.db_utils

commit e4117fcf51a256c5b708f909c818f452c794c0f8
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 10:07:35 2022 +0100

    Remove the old and unused log-schema.sql file

Link to build: https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/272/
See console output for more information: https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/272/console

Harbormaster returned this revision to the author for changes because remote builds failed.Feb 2 2022, 2:36 PM
Harbormaster failed remote builds in B26449: Diff 25619!

Build is green

Patch application report for D7062 (id=25622)

Could not rebase; Attempt merge onto aba5c80765...

Updating aba5c80..f375b23
Fast-forward
 MANIFEST.in                                        |   3 +-
 swh/core/cli/db.py                                 | 222 ++++++----------
 swh/core/db/db_utils.py                            | 295 ++++++++++++++++++++-
 swh/core/db/sql/35-dbmetadata.sql                  |  28 ++
 swh/core/db/tests/conftest.py                      |  40 +++
 .../tests/data/cli/{1-schema.sql => 30-schema.sql} |   0
 .../db/tests/data/cli/{3-func.sql => 40-funcs.sql} |   0
 .../db/tests/data/cli/{4-data.sql => 50-data.sql}  |   0
 swh/core/db/tests/test_cli.py                      |  34 +--
 swh/core/db/tests/test_db_utils.py                 |  77 ++++++
 swh/core/sql/log-schema.sql                        |  33 ---
 11 files changed, 519 insertions(+), 213 deletions(-)
 create mode 100644 swh/core/db/sql/35-dbmetadata.sql
 rename swh/core/db/tests/data/cli/{1-schema.sql => 30-schema.sql} (100%)
 rename swh/core/db/tests/data/cli/{3-func.sql => 40-funcs.sql} (100%)
 rename swh/core/db/tests/data/cli/{4-data.sql => 50-data.sql} (100%)
 create mode 100644 swh/core/db/tests/test_db_utils.py
 delete mode 100644 swh/core/sql/log-schema.sql
Changes applied before test
commit f375b231aa73d934fb44c7d1071c20d4121a6186
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 15:02:51 2022 +0100

    Add support for dbversion and dbmodule handling in swh db init
    
    for now, managing the dbversion table (storing the db schema upgrade
    history) is partially the responsibility of the swh modules implementing
    a datastore (swh-storage etc.), and the actual swh module was not stored
    in the database.
    
    This adds support for both the dbversion and dbmodule tables management
    in swh.core.db.
    
    The `swh db init` command now creates both these tables and initialise
    their values if possible.
    For this, it introduces a new common "API" for swh.core.db based datastores:
    
    - a swh module implementing a datastore is expected to have a
      `get_datastore` function in its top namespace; this function is
      typically the actual `get_storage`, `get_scheduler` etc. functions
    - this factory function is expected to use the "postgresql" cls for the
      datastore based on swh.core.db,
    - the datastore instance (eg. instance of the swh.storage.postgresql.Storage
      class) is expected to have a `get_current_version()` method (returning
      an int).
    
    This revision also provides a new `swh db version` command displaying
    the current (code) version (if avalable), the db version (possibly the whole
    version history) and flavor (if any) for the datastore given as argument.
    
    It should be bacward compatible with eisting swh datastore modules (i.e.
    not adapter to this new API).

commit c75374bafd77dfff82904985b935bbd035147cbc
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 16:00:43 2022 +0100

    Refactor the mock_package_sql fixture to allow multiple sql script set
    
    will be used to implement several test scenarios in following commits.
    Also rename existing sql scripts with a "legit" numbering scheme.

commit 5c5b981020c4b3e38f89f112480b7297cceded5c
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 14:54:39 2022 +0100

    Make `swh db init` retrieve the db cnx uri from the config file

commit d3b7b081fd3eda460d11d1d6d70f2b52fa3e0ef3
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 14:49:41 2022 +0100

    Move utility functions from cli.db to db.db_utils

commit e4117fcf51a256c5b708f909c818f452c794c0f8
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 10:07:35 2022 +0100

    Remove the old and unused log-schema.sql file

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/275/ for more details.

Build is green

Patch application report for D7062 (id=25634)

Could not rebase; Attempt merge onto aba5c80765...

Updating aba5c80..2f2b047
Fast-forward
 MANIFEST.in                                        |   3 +-
 swh/core/cli/db.py                                 | 222 ++++++----------
 swh/core/db/db_utils.py                            | 292 ++++++++++++++++++++-
 swh/core/db/sql/35-dbmetadata.sql                  |  28 ++
 swh/core/db/tests/conftest.py                      |  40 +++
 .../tests/data/cli/{1-schema.sql => 30-schema.sql} |   0
 .../db/tests/data/cli/{3-func.sql => 40-funcs.sql} |   0
 .../db/tests/data/cli/{4-data.sql => 50-data.sql}  |   0
 swh/core/db/tests/test_cli.py                      |  34 +--
 swh/core/db/tests/test_db_utils.py                 |  77 ++++++
 swh/core/sql/log-schema.sql                        |  33 ---
 11 files changed, 516 insertions(+), 213 deletions(-)
 create mode 100644 swh/core/db/sql/35-dbmetadata.sql
 rename swh/core/db/tests/data/cli/{1-schema.sql => 30-schema.sql} (100%)
 rename swh/core/db/tests/data/cli/{3-func.sql => 40-funcs.sql} (100%)
 rename swh/core/db/tests/data/cli/{4-data.sql => 50-data.sql} (100%)
 create mode 100644 swh/core/db/tests/test_db_utils.py
 delete mode 100644 swh/core/sql/log-schema.sql
Changes applied before test
commit 2f2b047ffd906acf0d64d8f0d94e192e49759480
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 15:02:51 2022 +0100

    Add support for dbversion and dbmodule handling in swh db init
    
    for now, managing the dbversion table (storing the db schema upgrade
    history) is partially the responsibility of the swh modules implementing
    a datastore (swh-storage etc.), and the actual swh module was not stored
    in the database.
    
    This adds support for both the dbversion and dbmodule tables management
    in swh.core.db.
    
    The `swh db init` command now creates both these tables and initialise
    their values if possible.
    For this, it introduces a new common "API" for swh.core.db based datastores:
    
    - a swh module implementing a datastore is expected to have a
      `get_datastore` function in its top namespace; this function is
      typically the actual `get_storage`, `get_scheduler` etc. functions
    - this factory function is expected to use the "postgresql" cls for the
      datastore based on swh.core.db,
    - the datastore instance (eg. instance of the swh.storage.postgresql.Storage
      class) is expected to have a `get_current_version()` method (returning
      an int).
    
    This revision also provides a new `swh db version` command displaying
    the current (code) version (if avalable), the db version (possibly the whole
    version history) and flavor (if any) for the datastore given as argument.
    
    It should be bacward compatible with eisting swh datastore modules (i.e.
    not adapter to this new API).

commit 7774e8e504afe02b46dc075e3ce9cc82bcc59ab8
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 16:00:43 2022 +0100

    Refactor the mock_package_sql fixture to allow multiple sql script set
    
    will be used to implement several test scenarios in following commits.
    Also rename existing sql scripts with a "legit" numbering scheme.

commit 36bc18267c911abb85b6a1cfc3321942e0359369
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 14:54:39 2022 +0100

    Make `swh db init` retrieve the db cnx uri from the config file

commit de1c894e3fa88b3e1e2ba1e7c7dbfc77f7ea4b51
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 14:49:41 2022 +0100

    Move utility functions from cli.db to db.db_utils

commit e4117fcf51a256c5b708f909c818f452c794c0f8
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 10:07:35 2022 +0100

    Remove the old and unused log-schema.sql file

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/287/ for more details.

Build is green

Patch application report for D7062 (id=25638)

Could not rebase; Attempt merge onto aba5c80765...

Updating aba5c80..7036f2d
Fast-forward
 MANIFEST.in                                        |   3 +-
 swh/core/cli/db.py                                 | 247 +++++++----------
 swh/core/db/db_utils.py                            | 292 ++++++++++++++++++++-
 swh/core/db/sql/35-dbmetadata.sql                  |  28 ++
 swh/core/db/tests/conftest.py                      |  40 +++
 .../tests/data/cli/{1-schema.sql => 30-schema.sql} |   0
 .../db/tests/data/cli/{3-func.sql => 40-funcs.sql} |   0
 .../db/tests/data/cli/{4-data.sql => 50-data.sql}  |   0
 swh/core/db/tests/test_cli.py                      |  34 +--
 swh/core/db/tests/test_db_utils.py                 |  77 ++++++
 swh/core/sql/log-schema.sql                        |  33 ---
 11 files changed, 531 insertions(+), 223 deletions(-)
 create mode 100644 swh/core/db/sql/35-dbmetadata.sql
 rename swh/core/db/tests/data/cli/{1-schema.sql => 30-schema.sql} (100%)
 rename swh/core/db/tests/data/cli/{3-func.sql => 40-funcs.sql} (100%)
 rename swh/core/db/tests/data/cli/{4-data.sql => 50-data.sql} (100%)
 create mode 100644 swh/core/db/tests/test_db_utils.py
 delete mode 100644 swh/core/sql/log-schema.sql
Changes applied before test
commit 7036f2d8cf38f8d5b5cb9a2df02235012e12ddcd
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 15:02:51 2022 +0100

    Add support for dbversion and dbmodule handling in swh db init
    
    for now, managing the dbversion table (storing the db schema upgrade
    history) is partially the responsibility of the swh modules implementing
    a datastore (swh-storage etc.), and the actual swh module was not stored
    in the database.
    
    This adds support for both the dbversion and dbmodule tables management
    in swh.core.db.
    
    The `swh db init` command now creates both these tables and initialise
    their values if possible.
    For this, it introduces a new common "API" for swh.core.db based datastores:
    
    - a swh module implementing a datastore is expected to have a
      `get_datastore` function in its top namespace; this function is
      typically the actual `get_storage`, `get_scheduler` etc. functions
    - this factory function is expected to use the "postgresql" cls for the
      datastore based on swh.core.db,
    - the datastore instance (eg. instance of the swh.storage.postgresql.Storage
      class) is expected to have a `get_current_version()` method (returning
      an int).
    
    This revision also provides a new `swh db version` command displaying
    the current (code) version (if avalable), the db version (possibly the whole
    version history) and flavor (if any) for the datastore given as argument.
    
    It should be bacward compatible with eisting swh datastore modules (i.e.
    not adapter to this new API).

commit d1ab8a3ab5cfef97918b5168c0fa15f5c26c0e18
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 16:00:43 2022 +0100

    Refactor the mock_package_sql fixture to allow multiple sql script set
    
    will be used to implement several test scenarios in following commits.
    Also rename existing sql scripts with a "legit" numbering scheme.

commit b11933f997e32db1b59be095a4dfe5d0c3be912b
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 14:54:39 2022 +0100

    Make `swh db init` retrieve the db cnx uri from the config file

commit de1c894e3fa88b3e1e2ba1e7c7dbfc77f7ea4b51
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 14:49:41 2022 +0100

    Move utility functions from cli.db to db.db_utils

commit e4117fcf51a256c5b708f909c818f452c794c0f8
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 10:07:35 2022 +0100

    Remove the old and unused log-schema.sql file

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/291/ for more details.

Thanks, this looks like a good step forward!

These "Also" changes would have been much easier to review (and land) as separate diffs; they're probably uncontroversial, but I'm not sure what they really are by looking at this diff :-)

The commit message needs some copy-editing, there's a bunch of typos ("avalable", "bacward", etc.). I'm not sure about the "datastore" terminology. I think "PostgreSQL databases" would be more accurate and descriptive?

You've chosen to interweave the common SQL files in other modules's database initialization in the populate_database_for_package function. Could you check whether other packages are using the get_sql_for_package function directly, and would be broken by this change? I think swh.storage may be using that function directly in its test fixtures. (although, I guess the change could also be done when the individual dbversion table is removed from individual modules...)

In D7062#183909, @olasd wrote:

Thanks, this looks like a good step forward!

These "Also" changes would have been much easier to review (and land) as separate diffs; they're probably uncontroversial, but I'm not sure what they really are by looking at this diff :-)

yeah I wasn't sure of the ratio number of diff/complexity...

The commit message needs some copy-editing, there's a bunch of typos ("avalable", "bacward", etc.). I'm not sure about the "datastore" terminology. I think "PostgreSQL databases" would be more accurate and descriptive?

I am not found of just "Postgresql db", it does not carry the idea of a swh 'datastore' backed by Postgresql (using a common db handling tooling). This thing here is essentially the need for non-postgresql related unification / API of this "SWH datastore" concept to help managing the postresql backend (and most probably other backends in the future).

You've chosen to interweave the common SQL files in other modules's database initialization in the populate_database_for_package function.

Yes that's something I am completely sure is a good idea. On the one hand, it's somewhat required to make it practically usable, but on the the other hand it makes the db init process hard to apprehend. It can be OK as long as we do not add complexity or subtle dependency trickery...

Could you check whether other packages are using the get_sql_for_package function directly,

I did and I found none.

and would be broken by this change?

Since I also have a bunch of WIP diffs implementing this new "swh datastore" API in sw packages using shw.core.db, I am testing those against this stack.

I think swh.storage may be using that function directly in its test fixtures. (although, I guess the change could also be done when the individual dbversion table is removed from individual modules...)

That's the idea yes.

Build is green

Patch application report for D7062 (id=25933)

Could not rebase; Attempt merge onto 5046a95f05...

Updating 5046a95..93c9c70
Fast-forward
 MANIFEST.in                                        |   3 +-
 swh/core/cli/db.py                                 | 247 +++++++----------
 swh/core/db/db_utils.py                            | 292 ++++++++++++++++++++-
 swh/core/db/sql/35-dbmetadata.sql                  |  28 ++
 swh/core/db/tests/conftest.py                      |  40 +++
 .../tests/data/cli/{1-schema.sql => 30-schema.sql} |   0
 .../db/tests/data/cli/{3-func.sql => 40-funcs.sql} |   0
 .../db/tests/data/cli/{4-data.sql => 50-data.sql}  |   0
 swh/core/db/tests/test_cli.py                      |  34 +--
 swh/core/db/tests/test_db_utils.py                 |  77 ++++++
 swh/core/sql/log-schema.sql                        |  33 ---
 swh/core/tests/test_utils.py                       | 193 +++++++-------
 swh/core/utils.py                                  |  16 +-
 13 files changed, 640 insertions(+), 323 deletions(-)
 create mode 100644 swh/core/db/sql/35-dbmetadata.sql
 rename swh/core/db/tests/data/cli/{1-schema.sql => 30-schema.sql} (100%)
 rename swh/core/db/tests/data/cli/{3-func.sql => 40-funcs.sql} (100%)
 rename swh/core/db/tests/data/cli/{4-data.sql => 50-data.sql} (100%)
 create mode 100644 swh/core/db/tests/test_db_utils.py
 delete mode 100644 swh/core/sql/log-schema.sql
Changes applied before test
commit 93c9c701d28bf38b37bcf8ba5607fad3116c0087
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 15:02:51 2022 +0100

    Add support for dbversion and dbmodule handling in swh db init
    
    for now, managing the dbversion table (storing the db schema upgrade
    history) is partially the responsibility of the swh modules implementing
    a datastore (swh-storage etc.), and the actual swh module was not stored
    in the database.
    
    This adds support for both the dbversion and dbmodule tables management
    in swh.core.db.
    
    The `swh db init` command now creates both these tables and initialise
    their values if possible.
    For this, it introduces a new common "API" for swh.core.db based datastores:
    
    - a swh module implementing a datastore is expected to have a
      `get_datastore` function in its top namespace; this function is
      typically the actual `get_storage`, `get_scheduler` etc. functions
    - this factory function is expected to use the "postgresql" cls for the
      datastore based on swh.core.db,
    - the datastore instance (eg. instance of the swh.storage.postgresql.Storage
      class) is expected to have a `get_current_version()` method (returning
      an int).
    
    This revision also provides a new `swh db version` command displaying
    the current (code) version (if avalable), the db version (possibly the whole
    version history) and flavor (if any) for the datastore given as argument.
    
    It should be bacward compatible with eisting swh datastore modules (i.e.
    not adapter to this new API).

commit b1e449fd91ebcc403966a79fd62d449921fcf550
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 16:00:43 2022 +0100

    Refactor the mock_package_sql fixture to allow multiple sql script set
    
    will be used to implement several test scenarios in following commits.
    Also rename existing sql scripts with a "legit" numbering scheme.

commit 3139d3344e232db83edacc95884604e47f0461cb
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Feb 11 12:10:43 2022 +0100

    Fix a bug in numfile_sortkey and add tests for it

commit 68dc6811967ff0f6fb1296f9687d22d7f0fb684b
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Feb 11 11:59:56 2022 +0100

    Convert test_utils to pytest

commit 3e43730df6150168ca99cb083bd2e60feb4a2771
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 14:54:39 2022 +0100

    Make `swh db init` retrieve the db cnx uri from the config file

commit 459da96c1e6b6044deb874e0e668606c3da3e43c
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 14:49:41 2022 +0100

    Move utility functions from cli.db to db.db_utils

commit 932eed4863baedf011b4a59d0227331d37db1dfd
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 10:07:35 2022 +0100

    Remove the old and unused log-schema.sql file

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/300/ for more details.

In D7062#183909, @olasd wrote:

Thanks, this looks like a good step forward!

These "Also" changes would have been much easier to review (and land) as separate diffs; they're probably uncontroversial, but I'm not sure what they really are by looking at this diff :-)

now in D7154

Build is green

Patch application report for D7062 (id=25945)

Could not rebase; Attempt merge onto 3139d3344e...

Updating 3139d33..943a813
Fast-forward
 MANIFEST.in                                        |   2 +
 swh/core/cli/db.py                                 |  67 +++++++++++
 swh/core/db/db_utils.py                            | 130 +++++++++++++++++++--
 swh/core/db/sql/35-dbmetadata.sql                  |  28 +++++
 swh/core/db/tests/conftest.py                      |  40 +++++++
 .../tests/data/cli/{1-schema.sql => 30-schema.sql} |   0
 .../db/tests/data/cli/{3-func.sql => 40-funcs.sql} |   0
 .../db/tests/data/cli/{4-data.sql => 50-data.sql}  |   0
 swh/core/db/tests/test_cli.py                      |  34 ++----
 swh/core/db/tests/test_db_utils.py                 |  77 ++++++++++++
 10 files changed, 346 insertions(+), 32 deletions(-)
 create mode 100644 swh/core/db/sql/35-dbmetadata.sql
 rename swh/core/db/tests/data/cli/{1-schema.sql => 30-schema.sql} (100%)
 rename swh/core/db/tests/data/cli/{3-func.sql => 40-funcs.sql} (100%)
 rename swh/core/db/tests/data/cli/{4-data.sql => 50-data.sql} (100%)
 create mode 100644 swh/core/db/tests/test_db_utils.py
Changes applied before test
commit 943a813e318248e0c51a35d7866dbb145290c82a
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 15:02:51 2022 +0100

    Add support for dbversion and dbmodule handling in swh db init
    
    for now, managing the dbversion table (storing the db schema upgrade
    history) is partially the responsibility of the swh modules implementing
    a datastore (swh-storage etc.), and the actual swh module was not stored
    in the database.
    
    This adds support for both the dbversion and dbmodule tables management
    in swh.core.db.
    
    The `swh db init` command now creates both these tables and initialize
    their values if possible.
    For this, it introduces a new common "API" for swh.core.db based datastores:
    
    - a swh module implementing a datastore is expected to have a
      `get_datastore` function in its top namespace; this function is
      typically the actual `get_storage`, `get_scheduler` etc. functions
    - this factory function is expected to use the "postgresql" cls for the
      datastore based on swh.core.db,
    - the datastore instance (eg. instance of the swh.storage.postgresql.Storage
      class) is expected to have a `get_current_version()` method (returning
      an int).
    
    This revision also provides a new `swh db version` command displaying
    the current (code) version (if available), the db version (possibly the whole
    version history) and flavor (if any) for the datastore given as argument.
    
    It should be backward compatible with existing swh datastore modules (i.e.
    not yet adapted to this new API).

commit b1e449fd91ebcc403966a79fd62d449921fcf550
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 16:00:43 2022 +0100

    Refactor the mock_package_sql fixture to allow multiple sql script set
    
    will be used to implement several test scenarios in following commits.
    Also rename existing sql scripts with a "legit" numbering scheme.

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/304/ for more details.

ardumont added a subscriber: ardumont.

lgtm

couple of questions/suggestions inline.

swh/core/db/db_utils.py
184
swh/core/db/sql/35-dbmetadata.sql
28

is the bogus column necessary? Can't you simply drop the inserted value in case the value already exists [1]?

[1]

create table if not exists dbmodule (
  dbmodule    text primary key,
):

insert into dbmodule(dbmodule) value (%s) on conflict dbmodule(dbmodule) do nothing;
swh/core/db/tests/conftest.py
51

smells like a missing rebase (you splitted that part in another diff iiuc).

swh/core/db/tests/test_db_utils.py
16
This revision is now accepted and ready to land.Feb 14 2022, 1:26 PM
douardda added inline comments.
swh/core/db/sql/35-dbmetadata.sql
28

I don't know, tbh I just replicated what's done for the dbflavor table...
And I believe we really do want to crash if a now row is inserted, it's a good clue that someone is doing something wrong (eg. wrong migration script, wrong swh package selected in an swh db command, etc.)

swh/core/db/tests/conftest.py
51

let me check that...

douardda added inline comments.
swh/core/db/tests/conftest.py
51

actually no, it's part of this diff, because I use the cli_runner fixture in the added test_db_utils.py test file, so I need the fixture to be moved to conftest.py. I could however split the git revision in 2 parts (not sure it worth the effort but if you think it does, I will)

swh/core/db/tests/conftest.py
51

fine, keep it as is, you did enough work on this already.
Thanks for checking.

Build has FAILED

Patch application report for D7062 (id=25982)

Could not rebase; Attempt merge onto 3139d3344e...

Updating 3139d33..d50695c
Fast-forward
 MANIFEST.in                                        |   2 +
 swh/core/cli/db.py                                 |  67 +++++++++++
 swh/core/db/db_utils.py                            | 130 +++++++++++++++++++--
 swh/core/db/sql/35-dbmetadata.sql                  |  28 +++++
 swh/core/db/tests/conftest.py                      |  40 +++++++
 .../tests/data/cli/{1-schema.sql => 30-schema.sql} |   0
 .../db/tests/data/cli/{3-func.sql => 40-funcs.sql} |   0
 .../db/tests/data/cli/{4-data.sql => 50-data.sql}  |   0
 swh/core/db/tests/test_cli.py                      |  34 ++----
 swh/core/db/tests/test_db_utils.py                 |  78 +++++++++++++
 10 files changed, 347 insertions(+), 32 deletions(-)
 create mode 100644 swh/core/db/sql/35-dbmetadata.sql
 rename swh/core/db/tests/data/cli/{1-schema.sql => 30-schema.sql} (100%)
 rename swh/core/db/tests/data/cli/{3-func.sql => 40-funcs.sql} (100%)
 rename swh/core/db/tests/data/cli/{4-data.sql => 50-data.sql} (100%)
 create mode 100644 swh/core/db/tests/test_db_utils.py
Changes applied before test
commit d50695c3feb57b1e21d93e78364a943232fe91ff
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 15:02:51 2022 +0100

    Add support for dbversion and dbmodule handling in swh db init
    
    for now, managing the dbversion table (storing the db schema upgrade
    history) is partially the responsibility of the swh modules implementing
    a datastore (swh-storage etc.), and the actual swh module was not stored
    in the database.
    
    This adds support for both the dbversion and dbmodule tables management
    in swh.core.db.
    
    The `swh db init` command now creates both these tables and initialize
    their values if possible.
    For this, it introduces a new common "API" for swh.core.db based datastores:
    
    - a swh module implementing a datastore is expected to have a
      `get_datastore` function in its top namespace; this function is
      typically the actual `get_storage`, `get_scheduler` etc. functions
    - this factory function is expected to use the "postgresql" cls for the
      datastore based on swh.core.db,
    - the datastore instance (eg. instance of the swh.storage.postgresql.Storage
      class) is expected to have a `get_current_version()` method (returning
      an int).
    
    This revision also provides a new `swh db version` command displaying
    the current (code) version (if available), the db version (possibly the whole
    version history) and flavor (if any) for the datastore given as argument.
    
    It should be backward compatible with existing swh datastore modules (i.e.
    not yet adapted to this new API).

commit b1e449fd91ebcc403966a79fd62d449921fcf550
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 16:00:43 2022 +0100

    Refactor the mock_package_sql fixture to allow multiple sql script set
    
    will be used to implement several test scenarios in following commits.
    Also rename existing sql scripts with a "legit" numbering scheme.

Link to build: https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/314/
See console output for more information: https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/314/console

Build is green

Patch application report for D7062 (id=25982)

Could not rebase; Attempt merge onto 3139d3344e...

Updating 3139d33..d50695c
Fast-forward
 MANIFEST.in                                        |   2 +
 swh/core/cli/db.py                                 |  67 +++++++++++
 swh/core/db/db_utils.py                            | 130 +++++++++++++++++++--
 swh/core/db/sql/35-dbmetadata.sql                  |  28 +++++
 swh/core/db/tests/conftest.py                      |  40 +++++++
 .../tests/data/cli/{1-schema.sql => 30-schema.sql} |   0
 .../db/tests/data/cli/{3-func.sql => 40-funcs.sql} |   0
 .../db/tests/data/cli/{4-data.sql => 50-data.sql}  |   0
 swh/core/db/tests/test_cli.py                      |  34 ++----
 swh/core/db/tests/test_db_utils.py                 |  78 +++++++++++++
 10 files changed, 347 insertions(+), 32 deletions(-)
 create mode 100644 swh/core/db/sql/35-dbmetadata.sql
 rename swh/core/db/tests/data/cli/{1-schema.sql => 30-schema.sql} (100%)
 rename swh/core/db/tests/data/cli/{3-func.sql => 40-funcs.sql} (100%)
 rename swh/core/db/tests/data/cli/{4-data.sql => 50-data.sql} (100%)
 create mode 100644 swh/core/db/tests/test_db_utils.py
Changes applied before test
commit d50695c3feb57b1e21d93e78364a943232fe91ff
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 15:02:51 2022 +0100

    Add support for dbversion and dbmodule handling in swh db init
    
    for now, managing the dbversion table (storing the db schema upgrade
    history) is partially the responsibility of the swh modules implementing
    a datastore (swh-storage etc.), and the actual swh module was not stored
    in the database.
    
    This adds support for both the dbversion and dbmodule tables management
    in swh.core.db.
    
    The `swh db init` command now creates both these tables and initialize
    their values if possible.
    For this, it introduces a new common "API" for swh.core.db based datastores:
    
    - a swh module implementing a datastore is expected to have a
      `get_datastore` function in its top namespace; this function is
      typically the actual `get_storage`, `get_scheduler` etc. functions
    - this factory function is expected to use the "postgresql" cls for the
      datastore based on swh.core.db,
    - the datastore instance (eg. instance of the swh.storage.postgresql.Storage
      class) is expected to have a `get_current_version()` method (returning
      an int).
    
    This revision also provides a new `swh db version` command displaying
    the current (code) version (if available), the db version (possibly the whole
    version history) and flavor (if any) for the datastore given as argument.
    
    It should be backward compatible with existing swh datastore modules (i.e.
    not yet adapted to this new API).

commit b1e449fd91ebcc403966a79fd62d449921fcf550
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 16:00:43 2022 +0100

    Refactor the mock_package_sql fixture to allow multiple sql script set
    
    will be used to implement several test scenarios in following commits.
    Also rename existing sql scripts with a "legit" numbering scheme.

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/322/ for more details.

Build is green

Patch application report for D7062 (id=26049)

Could not rebase; Attempt merge onto 3139d3344e...

Updating 3139d33..c4bd270
Fast-forward
 MANIFEST.in                                        |   2 +
 swh/core/cli/db.py                                 |  69 ++++++++++-
 swh/core/db/db_utils.py                            | 132 +++++++++++++++++++--
 swh/core/db/sql/35-dbmetadata.sql                  |  28 +++++
 swh/core/db/tests/conftest.py                      |  45 +++++++
 .../tests/data/cli/{1-schema.sql => 30-schema.sql} |   0
 .../db/tests/data/cli/{3-func.sql => 40-funcs.sql} |   0
 .../db/tests/data/cli/{4-data.sql => 50-data.sql}  |   0
 swh/core/db/tests/test_cli.py                      |  36 ++----
 swh/core/db/tests/test_db_utils.py                 |  83 +++++++++++++
 10 files changed, 360 insertions(+), 35 deletions(-)
 create mode 100644 swh/core/db/sql/35-dbmetadata.sql
 rename swh/core/db/tests/data/cli/{1-schema.sql => 30-schema.sql} (100%)
 rename swh/core/db/tests/data/cli/{3-func.sql => 40-funcs.sql} (100%)
 rename swh/core/db/tests/data/cli/{4-data.sql => 50-data.sql} (100%)
 create mode 100644 swh/core/db/tests/test_db_utils.py
Changes applied before test
commit c4bd270c3000df52176dafbf5c7f16e283e918c8
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 15:02:51 2022 +0100

    Add support for dbversion and dbmodule handling in swh db init
    
    for now, managing the dbversion table (storing the db schema upgrade
    history) is partially the responsibility of the swh modules implementing
    a datastore (swh-storage etc.), and the actual swh module was not stored
    in the database.
    
    This adds support for both the dbversion and dbmodule tables management
    in swh.core.db.
    
    The `swh db init` command now creates both these tables and initialize
    their values if possible.
    For this, it introduces a new common "API" for swh.core.db based datastores:
    
    - a swh module implementing a datastore is expected to have a
      `get_datastore` function in its top namespace; this function is
      typically the actual `get_storage`, `get_scheduler` etc. functions
    - this factory function is expected to use the "postgresql" cls for the
      datastore based on swh.core.db,
    - the datastore instance (eg. instance of the swh.storage.postgresql.Storage
      class) is expected to have a `get_current_version()` method (returning
      an int).
    
    This revision also provides a new `swh db version` command displaying
    the current (code) version (if available), the db version (possibly the whole
    version history) and flavor (if any) for the datastore given as argument.
    
    It should be backward compatible with existing swh datastore modules (i.e.
    not yet adapted to this new API).

commit b1e449fd91ebcc403966a79fd62d449921fcf550
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jan 28 16:00:43 2022 +0100

    Refactor the mock_package_sql fixture to allow multiple sql script set
    
    will be used to implement several test scenarios in following commits.
    Also rename existing sql scripts with a "legit" numbering scheme.

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/340/ for more details.