As per title.
We want a fast DB access, read-only, for workers running on azure.
Use cases are: vault workers and provenance cache population.
As per title.
We want a fast DB access, read-only, for workers running on azure.
Use cases are: vault workers and provenance cache population.
Status | Assigned | Task | ||
---|---|---|---|---|
Migrated | gitlab-migration | T888 Deploy the Vault and a DB replica on Azure | ||
Migrated | gitlab-migration | T883 set up a replica of the main DB on azure | ||
Migrated | gitlab-migration | T969 Azure database replica doesn't sustain writes |
Current database size: ~= 3880 GB
Maximum Azure drive size: 4095 GB
The sweet spot for Azure drives seems to be 2TB : 7500 IOPS and 250MB/s provisioned.
Smaller ones have less provisioned IOPS and bandwidth per second.
Bigger ones don't get faster.
Azure performance appears to be very far from reliable. Linux keeps complaining about issues similar to this one:
[Tue Jan 9 15:47:04 2018] INFO: task kworker/1:3:148 blocked for more than 120 seconds. [Tue Jan 9 15:47:04 2018] Not tainted 4.9.0-5-amd64 #1 Debian 4.9.65-3+deb9u2 [Tue Jan 9 15:47:04 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Pglogical replication requirements:
Use a .pgpass file in the postgres Unix account directory for that.
Provider setup (was already done, instructions may be unreliable):
createuser replicator --replication su - postgres psql template1 \c softwareheritage postgres localhost 5433 create extension pglogical; select pglogical.create_node( node_name := 'prado', dsn := 'host=prado.internal.softwareheritage.org port=5433 dbname=softwareheritage' ); select pglogical.replication_set_add_table('default', 'content', true); ALTER DEFAULT PRIVILEGES FOR ROLE swhstorage grant select on tables to replicator; GRANT USAGE ON SCHEMA pglogical TO replicator;
Subscriber setup:
su - postgres /usr/lib/postgresql/10/bin/initdb -D /srv/softwareheritage/postgres/10/replica
[Start up Postgres here]
createuser swhstorage psql template1 create user replicator SUPERUSER REPLICATION PASSWORD 'toto90'; create database softwareheritage with owner swhstorage encoding utf8 lc_collate 'C.UTF-8' lc_ctype 'C.UTF-8'; \q git clone https://forge.softwareheritage.org/source/swh-storage.git cd swh-storage psql softwareheritage create extension pglogical; ALTER DEFAULT PRIVILEGES FOR ROLE swhstorage grant all on tables to replicator; GRANT USAGE ON SCHEMA pglogical TO replicator; \i sql/swh-init.sql \i sql/swh-enums.sql \i sql/swh-schema.sql select pglogical.create_node( node_name := 'dbreplica0', dsn := 'host=dbreplica0.euwest.azure.internal.softwareheritage.org port=5433 dbname=softwareheritage user=replicator' ); select pglogical.create_subscription( subscription_name := 'dbreplica0', provider_dsn := 'host=prado.internal.softwareheritage.org port=5433 dbname=softwareheritage user=replicator' );
The replica database has to be properly initialized with the .sql scripts.
Do not try to use the synchronize_structure feature of pglogical, it will fail.
The existing replica using pglogical is unable to stay in sync with its master database
Replication technology changed to streaming replication (wal shipping).
Replication has been running fine since yesterday.
PostgreSQL master server is somerset.internal.softwareheritage.org:5433 .
Rough setup steps:
su - postgres createuser -p 5433 replicator --replication psql -p 5433 template1 alter user replicator with password 'my.precious.secret';
# pg_hba.conf # TYPE DATABASE USER CIDR-ADDRESS METHOD host replication replicator 192.168.200.0/21 md5 # Azure
kill -HUP $pid_of_postgres_master_process
# postgresql.conf wal_level = replica # or higher, logical is fine wal_keep_segments = 500 # in logfile segments, 16MB each; 0 disables wal_compression = on # enable compression of full-page writes
su - postgres touch .pgpass echo "somerset.internal.softwareheritage.org:5433:replication:replicator:my.precious.secret" >> .pgpass chmod 0600 .pgpass
# postgresql.conf max_connections = 128 # must be the same value as on the master
pg_basebackup --progress --write-recovery-conf \ -h somerset.internal.softwareheritage.org -p 5433 -U replicator \ -D /srv/softwareheritage/postgres/10/somerset.backup
cd /postgres/10 mv somerset.backup replica service postgresql start
great, thanks!
can you consolidate the bits of docs somewhere on the intranet? they'll be easier to find than on a task in the future
echo "somerset.internal.softwareheritage.org:5433:replication:replicator:my.precious.secret" >> .pgpass
There must be a typo here.
Testing on somerset with the postgres user, it does not work (as in, does not connect and falls back to ask for a password).
As far as i understood, replication is not a db, it's a right (or something).
cli used:
psql -h somerset.internal.softwareheritage.org -p 5433 -U replicator softwareheritage
There should be the name of the db instead of replication, as in:
echo "somerset.internal.softwareheritage.org:5433:softwareheritage:replicator:my.precious.secret" >> .pgpass
Now, the previous cli works, as in, i can connect.
can you consolidate the bits of docs somewhere on the intranet? they'll be easier to find than on a task in the future
@zack As i'm working on T1094, i'm consolidating notes based on those plus what i'm currently doing.
I'll edit the current wiki page [1] for that as soon as i'm successfully done with my own replica.
[1] https://intranet.softwareheritage.org/index.php?title=Pglogical_replication
Cheers,
There must be a typo here.
Well, yes and no.
TL;DR
The way to check if the replication connection is ok is wrong.
echo "somerset.internal.softwareheritage.org:5433:replication:replicator:my.precious.secret" >> .pgpass
So indeed, actually, this is the right line for the replication ;)
But, turns out that it's the way to check if the connection is working that's not right.
Actually triggering the last step (pg_basebackup) will say if it's ok or not so no need for an extra check ;)
Cheers,
I'll edit the current wiki page [1] for that...
I finally updated the [1] link but to add a note about us moving away from pg_logical replication (and targets the page about the new one [2]).
I thus added another page [2] instead since it's not the same replication technology.
[1] https://intranet.softwareheritage.org/index.php?title=Pglogical_replication
[2] https://intranet.softwareheritage.org/index.php?title=Streaming_Replication
about us moving away from pg_logical replication (and targets the page about the new one [2]).
i was wrong, we still use this so i removed the update (prado -> somerset).