Page MenuHomeSoftware Heritage

Allow simple read-only connections to db from swh nodes
Open, NormalPublic

Description

Our current recommended pg_service setup to connect to the DB replica is:

[swh-replica]
dbname=softwareheritage
host=somerset.internal.softwareheritage.org
user=guest
port=5433

But that needs one swh user to actually setup manually such file (and also be aware of what's what).
We should puppetize this and install it by default on our swh nodes so that user can simply execute:

swhuser@swhnode $ psql service=swh-replica

for example to connect to the db replica.
Related T1234#33001

Note:
As a first incremental step, i'll only reference the guest user (read-only access).
Shout out if we need also write access.

Event Timeline

zack created this task.Oct 3 2018, 3:40 PM
zack triaged this task as Normal priority.
zack updated the task description. (Show Details)
zack added a comment.May 25 2019, 5:25 PM

can we haz this, please? :)

olasd added a subscriber: olasd.May 28 2019, 4:07 PM

As mentioned in D1516 I don't think DNS provides the proper granularity for this; somerset is a "database replica" server only coincidentally; all its databases that aren't the replica of the main database are actually primaries.

I think providing a pgbouncer instance which would allow access to all database instances (with aliases) from anywhere would be a more sensible move.

zack added a comment.May 28 2019, 4:59 PM

[ moving here my feedback from D1516 ]

In D1516#34010, @olasd wrote:

I'm not convinced this is such a good idea; this machine is way more than a "db replica" server (it only has one replica, most its databases are actually primary) and I don't think DNS provides the appropriate granularity level to record this information.

Right, the machine does multiple things, but CNAMEs are not mutually exclusive.

I think configuring pgbouncer to have transparent access to read-only/read-write databases would be more appropriate and flexible than doing dns hacks.

Can we have both?

I think providing a pgbouncer instance which would allow access to all database instances (with aliases) from anywhere would be a more sensible move.

For information, a pgbouncer got installed manually on the new belvedere machine (reflecting as far as i understood @ftigeot the one we had on prado).
I'm pushing towards puppetizing it (i will work on it today).

Can we have both?

I agree with @olasd that somerset is more than a replica [1] (got primary db indexer, occurrences cluster for retro-compatibility work for @grouss, testdedup cluster from @seirl's work, ...).

I'm waiting on olasd's ack/nack to merge or not the diff.

[1] https://intranet.softwareheritage.org/wiki/Databases_inventory#somerset

Cheers,

ardumont added a comment.EditedJun 6 2019, 3:45 PM

Discussing a bit more with @zack, the need is actually to be able to transparently connect from any machine to the db services (without the hassle of configuring) [1]

Something like the following would be enough (pg_service.conf, .pgpass):

psql service=swh
psql service=swh-replica
...

That means, puppetize the ~/.pg_service.conf mentioned in doc.
The requisite is that it's available for all users connecting to swh nodes.
Implementation wise, we need to look on how to install it in a centralized manner per node (/etc/pg_service.conf or some such).

Probably:

  • we only want that for guest user (read only user). Or at least, we want that as a first incremental step.
  • ~/.pgpass might be not necessary here if we have pgbouncer setuped (done).

Personally, i've dealt with this declaratively on my machines (with home-manager) and my use is simple:

  • service="<db>": guest access to db (for example: swh, swh-deposit, swh-scheduler, etc...)
  • service="admin-<db>": main user which has write access (for example: admin-swh, admin-swh-deposit, etc...)

[1] https://www.postgresql.org/docs/11/libpq-pgservice.html

We already have something that tries to bootstrap that [1]

It needs to be completed and actually deployed.

[1] https://forge.softwareheritage.org/source/puppet-swh-site/browse/production/site-modules/profile/manifests/devel/postgres.pp$8

ardumont renamed this task from CNAME for dbreplica to Allow simple read-only connections to db from swh nodes.Jun 8 2019, 10:44 AM
ardumont updated the task description. (Show Details)
ardumont updated the task description. (Show Details)