Page MenuHomeSoftware Heritage

Factor-out common DB wrappers in swh.core
Closed, MigratedEdits Locked

Description

In a lot of swh services, there's a common reuse pattern for db interactions: initialize a postgresql connection, define a custom cursor object that reconnects (with retries) to the database, some commit/rollback wrappers, an autocommit() decorator and a cursor= parameter to pass transactions through subfunctions.

All this behavior should be normalized, and then factored out in swh.core. This could also be the occasion to find genericity patterns and adapt the base class for multiple types of usages, and to add context manager semantics to have a better unified way to do graceful teardown, for instance in database unit tests.

These are the places where I saw these common patterns (non exhaustive, You Can Help By Expanding It™):

  • swh.scheduler.backend.SchedulerBackend
  • swh.storage.db.BaseDb
  • swh.vault.backend
  • swh.indexer.storage.db (-> it does inherit from swh.storage.db.BaseDb though)

Event Timeline

You can add swh.indexer.storage.db.Db to your list.

I'll just quote the requirements.txt file from swh.core:

msgpack-python
psycopg2
python-dateutil
vcversioner
PyYAML
requests
Flask
systemd-python

Most of that isn't really *core* functionality, just stuff that got factored out from other modules. Also note that aiohttp is missing from the list although formally it should be there.

I'm not sure pushing more things down in swh.core is the right approach; I don't know if splitting it up in more modules is right either: I don't fancy a million modules containing two lines of code with the current status of our packaging/deployment automation ("inexistent").

Maybe a middle ground would be moving more towards setuptools "extra" dependencies? The main problem with that approach is it doesn't integrate well with the Debian packaging and you get to split everything by hand...

I agree we must be careful with not bloating swh.core, but the current subject, it really makes sense to me to put this basic db access wrapper as a core functionality.

The discussion about mandatory vs. optional dependencies is another topic IMHO.