# master (somerset) ## bootstrap db data replication from master to slave: ``` rsync -e 'ssh -i ~/.ssh/id_rsa.basebackup' -av --progress --exclude pg_wal --exclude postmaster.pid --exclude pg_replslot --delete --delete-excluded /srv/softwareheritage/postgres/10/indexer/ dbreplica1.euwest.azure.internal.softwareheritage.org:indexer/ ``` Note: This can take a while (depending on the db size) ## connect to db to start replication Do not close session between queries ``` postgres@somerset $ psql -p 5434 # starting replication with label 'replication' postgres=# SELECT pg_start_backup('replication', true, false); pg_start_backup ----------------- 1620/5D000028 (1 row) postgres=# select * from pg_stop_backup(false, false); lsn | labelfile | spcmapfile ---------------+-------------------------------------------------------------------+------------ 1625/772A8B40 | START WAL LOCATION: 1620/5D000028 (file 00000001000016200000005D)+| | CHECKPOINT LOCATION: 1620/5D000060 +| | BACKUP METHOD: streamed +| | BACKUP FROM: master +| | START TIME: 2018-06-19 09:53:36 UTC +| | LABEL: replication +| | | (1 row) ``` source: https://www.postgresql.org/docs/10/static/functions-admin.html#FUNCTIONS-ADMIN-BACKUP-TABLE Data directory whereabouts (master, connected to db to replicate): ``` postgres=# show data_directory; data_directory ------------------------------------------- /srv/softwareheritage/postgres/10/indexer (1 row) ``` # slave (dbreplica1) Edit backup_label in data directory: ``` postgres@dbreplica1:/srv/softwareheritage/postgres/10/indexer$ cat backup_label START WAL LOCATION: 1620/5D000028 (file 00000001000016200000005D) CHECKPOINT LOCATION: 1620/5D000060 BACKUP METHOD: streamed BACKUP FROM: master START TIME: 2018-06-19 09:53:36 UTC LABEL: replication ``` Add missing empty directories in data directory: ``` postgres@dbreplica1:/srv/softwareheritage/postgres/10/indexer$ mkdir pg_wal pg_replslot ``` Edit tablespace_map in data directory: ``` postgres@dbreplica1:/srv/softwareheritage/postgres/10/indexer$ touch tablespace_map postgres@dbreplica1:/srv/softwareheritage/postgres/10/indexer$ cat tablespace_map postgres@dbreplica1:/srv/softwareheritage/postgres/10/indexer$ ``` Edit recovery.conf in data directory: ``` postgres@dbreplica1:/srv/softwareheritage/postgres/10/indexer$ cat recovery.conf standby_mode = 'on' primary_conninfo = 'user=replicator passfile=''/var/lib/postgresql/.pgpass'' host=somerset.internal.softwareheritage.org port=5434 sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any' restore_command = 'cp -f /srv/softwareheritage/postgres/wal-archive/%f %p' #archive_cleanup_command = 'pg_archivecleanup /srv/softwareheritage/postgres/wal-archive %r' ``` # starting db Now that everything should be good to go, we should be able to start the db. ``` root@dbreplica1 $ systemctl start postgres@10-indexer.service ``` # Check logs At first, we should see that wal replication started: ``` ardumont@dbreplica1:~% sudo tail -f /var/log/postgresql/postgresql-10-indexer.log ... 2018-06-20 09:59:02.390 UTC [123439] postgres@postgres FATAL: the database system is starting up 2018-06-20 09:59:02.395 UTC [123440] postgres@postgres FATAL: the database system is starting up 2018-06-20 09:59:06.979 UTC [123052] LOG: restored log file "00000001000016200000009C" from archive 2018-06-20 09:59:11.398 UTC [123052] LOG: restored log file "00000001000016200000009D" from archive 2018-06-20 09:59:14.442 UTC [123052] LOG: restored log file "00000001000016200000009E" from archive 2018-06-20 09:59:16.637 UTC [123052] LOG: restored log file "00000001000016200000009F" from archive 2018-06-20 09:59:22.801 UTC [123052] LOG: restored log file "0000000100001620000000A0" from archive 2018-06-20 09:59:28.131 UTC [123052] LOG: restored log file "0000000100001620000000A1" from archive 2018-06-20 09:59:31.347 UTC [123052] LOG: restored log file "0000000100001620000000A2" from archive 2018-06-20 09:59:33.990 UTC [123052] LOG: restored log file "0000000100001620000000A3" from archive 2018-06-20 09:59:38.896 UTC [123052] LOG: restored log file "0000000100001620000000A4" from archive ... ``` source: https://www.postgresql.org/docs/10/static/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP Then at some point, we should see consistent recovery state entries. ``` ardumont@dbreplica1:~% sudo grep 'consistent recovery state' /var/log/postgresql/postgresql-10-indexer.log ... 2018-06-20 17:59:11.012 UTC [82376] LOG: consistent recovery state reached at 1627/9DD6C3F0 ``` So, now, to avoid too much wal files aggregation on the slave, we can flip the archive_cleanup_command on. In recovery.conf we edited earlier, remove the comment in the archive_cleanup_command line: ``` postgres@dbreplica1:/srv/softwareheritage/postgres/10/indexer$ tail -1 recovery.conf archive_cleanup_command = 'pg_archivecleanup /srv/softwareheritage/postgres/wal-archive %r' ``` Note: This will be in charge of the slave cleaning up wal file, when it has applied it. Reload the database's configuration: ``` sudo systemctl reload postgresql@10-indexer.service ```