Page MenuHomeSoftware Heritage

set up pg_logical-based master/slave replication for the postgres db
Closed, MigratedEdits Locked

Description

We are currently using a single Postgres DB for all our operations, which is suboptimal because it has to be optimized for many different use cases.
To fix that we want to setup two DBs, in master/slave replication:

  • master optimized "for writes" (i.e., for adding new content to the DB, which is our main write task)
  • slave optimized "for reads" (i.e., queries from both the Web UI and other data crunching tasks)

Reality will be more blurry than this, because writes do require to perform some reads (e.g., to check which objects are already in the archive), but the above should be the gist of it.

Event Timeline

olasd changed the task status from Open to Work in Progress.Jan 9 2017, 3:05 PM

Set up a new virtual machine for the database replica (somerset).

Configured postgresql with a setup equivalent to prado's main database.

Replicated the user credentials from prado.

Initialized an empty softwareheritage database on the replica.

Initialized schema from the swh-storage repository.

Granted access lists from the base softwareheritage database.

Following instructions on https://2ndquadrant.com/en/resources/pglogical/pglogical-docs/, I set up the first replication set on our main database to replicate the "content" table.

Everything connects as the postgres user as pglogical apparently needs superuser privileges.

The replication is ongoing. The replica complains about checkpoints being too frequent, I'm not sure what to do there.

The replica seems to checkpoint less often when allowing for an egregious amount of WALs (200GB instead of 2GB). Unfortunately the initial replication cannot be interrupted.

Using our standard endpoint to add content for example triggers an unhappy exception whose root cause is 'psycopg2.ProgrammingError: permission denied for schema pglogical'.
P125 for a more thorough example.

Using our standard endpoint to add content for example triggers an unhappy exception whose root cause is 'psycopg2.ProgrammingError: permission denied for schema pglogical'.
P125 for a more thorough example.

This is due to the following upstream bug on pglogical: https://github.com/2ndQuadrant/pglogical/issues/15

All users need to be granted usage on the pglogical schema when executing DDL commands such as create temp table [...] like content.

This is due to the following upstream bug on pglogical: https://github.com/2ndQuadrant/pglogical/issues/15

All users need to be granted usage on the pglogical schema when executing DDL commands such as create temp table [...] like content.

Awesome. I concur you "fixed" it (no more error like this is reported anymore).
Thanks again.

Ongoing replication was failing on the softwareheritage database: it turns out that the encoding settings for the original database and the replica didn't match.

-- create the new database
create database softwareheritage with template template0 owner swhstorage encoding utf8 lc_collate 'C.UTF-8' lc_ctype 'C.UTF-8';

-- connect to it as superuser
\c softwareheritage

-- add pglogical
create extension pglogical;
\i sql/swh-init.sql

-- set proper default privileges
ALTER DEFAULT PRIVILEGES FOR ROLE swhstorage grant all on tables to postgres;
ALTER DEFAULT PRIVILEGES FOR ROLE swhstorage grant all on tables to guest;
ALTER DEFAULT PRIVILEGES FOR ROLE swhstorage grant all on tables to zack;
ALTER DEFAULT PRIVILEGES FOR ROLE swhstorage grant all on tables to ardumont;
ALTER DEFAULT PRIVILEGES FOR ROLE swhstorage grant all on tables to olasd;

-- reconnect as swhstorage
\c softwareheritage swhstorage localhost 5433
[input password]

-- initialize schema
\i sql/swh-enums.sql
\i sql/swh-schema.sql

-- start replication
SELECT pglogical.create_node(
  node_name := 'somerset',
  dsn := 'host=somerset.internal.softwareheritage.org port=5433 dbname=softwareheritage user=postgres password=<redacted>'
);
SELECT pglogical.create_subscription(
  subscription_name := 'subscription1',
  provider_dsn := 'host=prado.internal.softwareheritage.org port=5433 dbname=softwareheritage user=postgres password=<redacted>'
);

-- the replicated tables should start filling up.
zack added a project: Restricted Project.Jan 11 2017, 4:45 PM
zack moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.
zack moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.

After a proper setup of the table, the replication of the content table succeeded. Ongoing replication works as well.

I'm adding more tables to the replication set, starting with small tables.

I started with some small tables:

  • directory_entry_rev
  • origin
  • person
  • origin_visit

I stumbled upon an issue with origin_visit: we do updates on this table, and therefore we need a primary key on the replica table (matching the one on the source table), to be able to replicate UPDATEs.

Lack of this index messed up the synchronization, which could recover with:

  • adding the pkey
  • truncating the table
  • restarting the replica

The resync is ongoing.

Replicated more tables.

Current status :

softwareheritage=# select * from pglogical.tables order by relname;                                                                                                                                                                           
 nspname |             relname              | set_name 
---------+----------------------------------+----------
 public  | cache_content_revision           | 
 public  | cache_content_revision_processed | 
 public  | cache_revision_origin            | 
 public  | content                          | default
 public  | content_ctags                    | 
 public  | content_fossology_license        | 
 public  | content_language                 | 
 public  | content_mimetype                 | 
 public  | dbversion                        | 
 public  | directory                        | default
 public  | directory_entry_dir              | default
 public  | directory_entry_file             | default
 public  | directory_entry_rev              | default
 public  | entity                           | 
 public  | entity_equivalence               | 
 public  | entity_history                   | 
 public  | fetch_history                    | 
 public  | fossology_license                | 
 public  | indexer_configuration            | 
 public  | list_history                     | 
 public  | listable_entity                  | 
 public  | occurrence                       | 
 public  | occurrence_history               | 
 public  | origin                           | default
 public  | origin_visit                     | default
 public  | person                           | default
 public  | release                          | default
 public  | revision                         | 
 public  | revision_history                 | 
 public  | skipped_content                  | 
(30 rows)

(replicated tables : those in the default set)

After adding the directory table, the replication crashed repeatedly with an obscure error (ERROR: epoll_ctl() failed with Invalid Argument).

No amounts of hitting the replication slots on the source database helped restore the replication in working order. I therefore reinitialized the replication by wiping the destination and starting from scratch.

While doing so, I noticed that the initial data copy for pglogical uses a bare COPY without any indication of which columns are copied. It turns out that the current implementation of pglogical is dependent on the order of columns in the source and destination databases. It turns out that the data types were compatible enough that we didn't notice.

After adapting our schema (in swh-storage) to the order of columns on the main database, I restarted the replication, table by table.

The replication has been keeping up with changes for days and we're currently replicating the last table (revision), without seeing an issue.

softwareheritage=# select * from pg_replication_slots join lateral (select pg_current_xlog_location()) as xlog on true;
                slot_name                 |      plugin      | slot_type | datoid |     database     | active | active_pid | xmin | catalog_xmin |  restart_lsn  | confirmed_flush_lsn | pg_current_xlog_location 
------------------------------------------+------------------+-----------+--------+------------------+--------+------------+------+--------------+---------------+---------------------+--------------------------
 pgl_softwareheritage_prado_subscription1 | pglogical_output | logical   |  16400 | softwareheritage | t      |      13292 |      |    683272044 | 835E/889C0EB0 | 835E/970200D8       | 835E/970200D8
(1 row)

softwareheritage=# 
softwareheritage=# select * from pglogical.tables order by relname;                                                                                                                                                                           
 nspname |             relname              |      set_name       
---------+----------------------------------+---------------------
 public  | cache_content_revision           | 
 public  | cache_content_revision_processed | 
 public  | cache_revision_origin            | 
 public  | content                          | default
 public  | content_ctags                    | default_insert_only
 public  | content_fossology_license        | default
 public  | content_language                 | default
 public  | content_mimetype                 | default
 public  | dbversion                        | 
 public  | directory                        | default
 public  | directory_entry_dir              | default
 public  | directory_entry_file             | default
 public  | directory_entry_rev              | default
 public  | entity                           | default
 public  | entity_equivalence               | default
 public  | entity_history                   | default
 public  | fetch_history                    | default
 public  | fossology_license                | default
 public  | indexer_configuration            | default
 public  | list_history                     | default
 public  | listable_entity                  | default
 public  | occurrence                       | default
 public  | occurrence_history               | default
 public  | origin                           | default
 public  | origin_visit                     | default
 public  | person                           | default
 public  | release                          | default
 public  | revision                         | default
 public  | revision_history                 | default
 public  | skipped_content                  | default_insert_only
(30 rows)

all the tables are now being replicated live, as seen by the original query (confirmed_flush_lsn == pg_current_xlog_location)

Indexes have been created on the replica database. This is now done