# master (somerset)

## bootstrap db data replication from master to slave:

```
rsync -e 'ssh -i ~/.ssh/id_rsa.basebackup' -av --progress --exclude pg_wal --exclude postmaster.pid --exclude pg_replslot --delete --delete-excluded /srv/softwareheritage/postgres/10/indexer/ dbreplica1.euwest.azure.internal.softwareheritage.org:indexer/
```

Note: This can take a while (depending on the db size)

## connect to db to start replication

Do not close session between queries

```
postgres@somerset $ psql -p 5434

# starting replication with label 'replication'
postgres=# SELECT pg_start_backup('replication', true, false);
 pg_start_backup
-----------------
 1620/5D000028
(1 row)

postgres=# select * from pg_stop_backup(false, false);
      lsn      |                             labelfile                             | spcmapfile
---------------+-------------------------------------------------------------------+------------
 1625/772A8B40 | START WAL LOCATION: 1620/5D000028 (file 00000001000016200000005D)+|
               | CHECKPOINT LOCATION: 1620/5D000060                               +|
               | BACKUP METHOD: streamed                                          +|
               | BACKUP FROM: master                                              +|
               | START TIME: 2018-06-19 09:53:36 UTC                              +|
               | LABEL: replication                                               +|
               |                                                                   |
(1 row)
```
source: https://www.postgresql.org/docs/10/static/functions-admin.html#FUNCTIONS-ADMIN-BACKUP-TABLE

Data directory whereabouts (master, connected to db to replicate):

```
postgres=# show data_directory;
              data_directory
-------------------------------------------
 /srv/softwareheritage/postgres/10/indexer
(1 row)
```

# slave (dbreplica1)

Edit backup_label in data directory:

```
postgres@dbreplica1:/srv/softwareheritage/postgres/10/indexer$ cat backup_label
START WAL LOCATION: 1620/5D000028 (file 00000001000016200000005D)
CHECKPOINT LOCATION: 1620/5D000060
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2018-06-19 09:53:36 UTC
LABEL: replication
```

Add missing empty directories in data directory:

```
postgres@dbreplica1:/srv/softwareheritage/postgres/10/indexer$ mkdir pg_wal pg_replslot
```

Edit tablespace_map in data directory:

```
postgres@dbreplica1:/srv/softwareheritage/postgres/10/indexer$ touch tablespace_map
postgres@dbreplica1:/srv/softwareheritage/postgres/10/indexer$ cat tablespace_map
postgres@dbreplica1:/srv/softwareheritage/postgres/10/indexer$
```

Edit recovery.conf in data directory:

```
postgres@dbreplica1:/srv/softwareheritage/postgres/10/indexer$ cat recovery.conf
standby_mode = 'on'
primary_conninfo = 'user=replicator passfile=''/var/lib/postgresql/.pgpass'' host=somerset.internal.softwareheritage.org port=5434 sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any'
restore_command = 'cp -f /srv/softwareheritage/postgres/wal-archive/%f %p'
#archive_cleanup_command = 'pg_archivecleanup /srv/softwareheritage/postgres/wal-archive %r'
```

# starting db

Now that everything should be good to go, we should be able to start the db.

```
root@dbreplica1 $ systemctl start postgres@10-indexer.service
```

# Check logs

At first, we should see that wal replication started:

```
ardumont@dbreplica1:~% sudo tail -f /var/log/postgresql/postgresql-10-indexer.log
...
2018-06-20 09:59:02.390 UTC [123439] postgres@postgres FATAL:  the database system is starting up
2018-06-20 09:59:02.395 UTC [123440] postgres@postgres FATAL:  the database system is starting up
2018-06-20 09:59:06.979 UTC [123052] LOG:  restored log file "00000001000016200000009C" from archive
2018-06-20 09:59:11.398 UTC [123052] LOG:  restored log file "00000001000016200000009D" from archive
2018-06-20 09:59:14.442 UTC [123052] LOG:  restored log file "00000001000016200000009E" from archive
2018-06-20 09:59:16.637 UTC [123052] LOG:  restored log file "00000001000016200000009F" from archive
2018-06-20 09:59:22.801 UTC [123052] LOG:  restored log file "0000000100001620000000A0" from archive
2018-06-20 09:59:28.131 UTC [123052] LOG:  restored log file "0000000100001620000000A1" from archive
2018-06-20 09:59:31.347 UTC [123052] LOG:  restored log file "0000000100001620000000A2" from archive
2018-06-20 09:59:33.990 UTC [123052] LOG:  restored log file "0000000100001620000000A3" from archive
2018-06-20 09:59:38.896 UTC [123052] LOG:  restored log file "0000000100001620000000A4" from archive
...

```

source: https://www.postgresql.org/docs/10/static/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP

Then at some point, we should see consistent recovery state entries.

```
ardumont@dbreplica1:~% sudo grep 'consistent recovery state' /var/log/postgresql/postgresql-10-indexer.log
...
2018-06-20 17:59:11.012 UTC [82376] LOG:  consistent recovery state reached at 1627/9DD6C3F0
```

So, now, to avoid too much wal files aggregation on the slave, we can flip the archive_cleanup_command on.

In recovery.conf we edited earlier, remove the comment in the archive_cleanup_command line:

```
postgres@dbreplica1:/srv/softwareheritage/postgres/10/indexer$ tail -1 recovery.conf
archive_cleanup_command = 'pg_archivecleanup /srv/softwareheritage/postgres/wal-archive %r'
```

Note: This will be in charge of the slave cleaning up wal file, when it has applied it.

Reload the database's configuration:

```
sudo systemctl reload postgresql@10-indexer.service
```