Page MenuHomeSoftware Heritage

Refactor the db scaffolding to use swh.core.db
ClosedPublic

Authored by douardda on Feb 12 2021, 4:31 PM.

Details

Summary

There are several parts in this diff (should be in reasnably correctly splittes git revsions):

  • split the 2 provenance.sql scripts in parts and move them to swh/provenance/sql so the database creation and initialization are handled by the swh db init and swh db create tools from swh.core.db.
  • with/witout path choice is implemented using 2 flavors: with-path and without-path. So the initialization command should be something like:

    swh db init -d db-name --flavor with-path provenance
  • refactor the database creation code and a test for the cli command
  • this later evolution comes with a few changes in the configuration file format to "normalize" it a bit and ease testing:
    • 'ps' config cls for the archive has been renamed as 'direct'
    • 'ps' confif cls for the provenance db has been renamed as 'local' (similar to other swh packages),
    • 'ps_np' has been suppressed in favor of a 'with_path' config flag in the 'provenance' section.
  • also add a provenance fixture in a conftest.py file.

Diff Detail

Repository
rDPROV Provenance database
Branch
pre-commit
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 19252
Build 29859: arc lint + arc unit

Event Timeline

douardda created this revision.
douardda retitled this revision from Refactor the db scaffolding to use swh.core.db to WIP Refactor the db scaffolding to use swh.core.db.Feb 12 2021, 5:33 PM

Fix tests and properly initialize the db using swh.core.db in the pytest fixture

douardda retitled this revision from WIP Refactor the db scaffolding to use swh.core.db to Refactor the db scaffolding to use swh.core.db.Feb 16 2021, 12:14 PM
douardda added reviewers: aeviso, Reviewers.
vlorentz added inline comments.
README.md
29

outdated cls?

swh/provenance/__init__.py
21

let's call it "postgres" (we should also rename it that way in swh-storage)

I'm not convinced by the flavor thing. Why not a single schema, but with empty values in loc when it's disabled?

I agree with this revision. I only have those few comments/questions above.

swh/provenance/__init__.py
21

I agree "postgres" is more declarative as we can have several "local" db types.

23–26

Does this mean that the default is without path? I think we want it the other way around.

swh/provenance/sql/30-schema.sql
9

Why are the indexes created with the table here and separately in the case of the flavors? Do we have an standard for that?

I'm not convinced by the flavor thing. Why not a single schema, but with empty values in loc when it's disabled?

I've kept the logic as it was before... just made it use swh.core

Ir's not even sure the 'without-path' version will stay, so...

swh/provenance/__init__.py
21

then I'll change it here when it's done in "core components"

23–26

oh you're right, let me fix this

swh/provenance/sql/30-schema.sql
9

I've mainly been lazy there. This is common to both flavors, so it can stay here, but yeas, it would make sense to put it in a dedicated 60-indexes.sql file, since this is generally what's done in other packages.

rebase + default to with_path=True as asked by @aeviso

I've not extracted all the index creations to a dedicated sql file, too tired for that right now...

This revision is now accepted and ready to land.Feb 18 2021, 7:28 PM