Page MenuHomeSoftware Heritage
Paste P276

replication note (somerset - swh-indexer -> dbreplica1 - swh-indexer)
ActivePublic

Authored by ardumont on Jun 20 2018, 11:31 AM.
# master (somerset)
## bootstrap db data replication from master to slave:
```
rsync -e 'ssh -i ~/.ssh/id_rsa.basebackup' -av --progress --exclude pg_wal --exclude postmaster.pid --exclude pg_replslot --delete --delete-excluded /srv/softwareheritage/postgres/10/indexer/ dbreplica1.euwest.azure.internal.softwareheritage.org:indexer/
```
Note: This can take a while (depending on the db size)
## connect to db to start replication
Do not close session between queries
```
postgres@somerset $ psql -p 5434
# starting replication with label 'replication'
postgres=# SELECT pg_start_backup('replication', true, false);
pg_start_backup
-----------------
1620/5D000028
(1 row)
postgres=# select * from pg_stop_backup(false, false);
lsn | labelfile | spcmapfile
---------------+-------------------------------------------------------------------+------------
1625/772A8B40 | START WAL LOCATION: 1620/5D000028 (file 00000001000016200000005D)+|
| CHECKPOINT LOCATION: 1620/5D000060 +|
| BACKUP METHOD: streamed +|
| BACKUP FROM: master +|
| START TIME: 2018-06-19 09:53:36 UTC +|
| LABEL: replication +|
| |
(1 row)
```
source: https://www.postgresql.org/docs/10/static/functions-admin.html#FUNCTIONS-ADMIN-BACKUP-TABLE
Data directory whereabouts (master, connected to db to replicate):
```
postgres=# show data_directory;
data_directory
-------------------------------------------
/srv/softwareheritage/postgres/10/indexer
(1 row)
```
# slave (dbreplica1)
Edit backup_label in data directory:
```
postgres@dbreplica1:/srv/softwareheritage/postgres/10/indexer$ cat backup_label
START WAL LOCATION: 1620/5D000028 (file 00000001000016200000005D)
CHECKPOINT LOCATION: 1620/5D000060
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2018-06-19 09:53:36 UTC
LABEL: replication
```
Add missing empty directories in data directory:
```
postgres@dbreplica1:/srv/softwareheritage/postgres/10/indexer$ mkdir pg_wal pg_replslot
```
Edit tablespace_map in data directory:
```
postgres@dbreplica1:/srv/softwareheritage/postgres/10/indexer$ touch tablespace_map
postgres@dbreplica1:/srv/softwareheritage/postgres/10/indexer$ cat tablespace_map
postgres@dbreplica1:/srv/softwareheritage/postgres/10/indexer$
```
Edit recovery.conf in data directory:
```
postgres@dbreplica1:/srv/softwareheritage/postgres/10/indexer$ cat recovery.conf
standby_mode = 'on'
primary_conninfo = 'user=replicator passfile=''/var/lib/postgresql/.pgpass'' host=somerset.internal.softwareheritage.org port=5434 sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any'
restore_command = 'cp -f /srv/softwareheritage/postgres/wal-archive/%f %p'
#archive_cleanup_command = 'pg_archivecleanup /srv/softwareheritage/postgres/wal-archive %r'
```
# starting db
Now that everything should be good to go, we should be able to start the db.
```
root@dbreplica1 $ systemctl start postgres@10-indexer.service
```
# Check logs
At first, we should see that wal replication started:
```
ardumont@dbreplica1:~% sudo tail -f /var/log/postgresql/postgresql-10-indexer.log
...
2018-06-20 09:59:02.390 UTC [123439] postgres@postgres FATAL: the database system is starting up
2018-06-20 09:59:02.395 UTC [123440] postgres@postgres FATAL: the database system is starting up
2018-06-20 09:59:06.979 UTC [123052] LOG: restored log file "00000001000016200000009C" from archive
2018-06-20 09:59:11.398 UTC [123052] LOG: restored log file "00000001000016200000009D" from archive
2018-06-20 09:59:14.442 UTC [123052] LOG: restored log file "00000001000016200000009E" from archive
2018-06-20 09:59:16.637 UTC [123052] LOG: restored log file "00000001000016200000009F" from archive
2018-06-20 09:59:22.801 UTC [123052] LOG: restored log file "0000000100001620000000A0" from archive
2018-06-20 09:59:28.131 UTC [123052] LOG: restored log file "0000000100001620000000A1" from archive
2018-06-20 09:59:31.347 UTC [123052] LOG: restored log file "0000000100001620000000A2" from archive
2018-06-20 09:59:33.990 UTC [123052] LOG: restored log file "0000000100001620000000A3" from archive
2018-06-20 09:59:38.896 UTC [123052] LOG: restored log file "0000000100001620000000A4" from archive
...
```
source: https://www.postgresql.org/docs/10/static/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP
Then at some point, we should see consistent recovery state entries.
```
ardumont@dbreplica1:~% sudo grep 'consistent recovery state' /var/log/postgresql/postgresql-10-indexer.log
...
2018-06-20 17:59:11.012 UTC [82376] LOG: consistent recovery state reached at 1627/9DD6C3F0
```
So, now, to avoid too much wal files aggregation on the slave, we can flip the archive_cleanup_command on.
In recovery.conf we edited earlier, remove the comment in the archive_cleanup_command line:
```
postgres@dbreplica1:/srv/softwareheritage/postgres/10/indexer$ tail -1 recovery.conf
archive_cleanup_command = 'pg_archivecleanup /srv/softwareheritage/postgres/wal-archive %r'
```
Note: This will be in charge of the slave cleaning up wal file, when it has applied it.
Reload the database's configuration:
```
sudo systemctl reload postgresql@10-indexer.service
```

Event Timeline

ardumont added a subscriber: olasd.
ardumont edited the content of this paste. (Show Details)
ardumont changed the title of this paste from replicatio note (somerset - swh-indexer -> dbreplica1 - swh-indexer) to replication note (somerset - swh-indexer -> dbreplica1 - swh-indexer).Jun 20 2018, 12:02 PM
ardumont edited the content of this paste. (Show Details)
ardumont edited the content of this paste. (Show Details)
ardumont edited the content of this paste. (Show Details)