feat(cdc): Prepare the self hosted environment for the Change Data Capture pipeline (#938)
We will use Change Data Capture to stream WAL updates from postgres into clickhouse so that features like issue search will be able to join event data and metadata (from postgres) through Snuba.
This requires the followings:
A logical replicaiton plugin to be installed in postgres (https://github.com/getsentry/wal2json)
A service to run that streams from the replication log to Kafka (https://github.com/getsentry/cdc)
Datasets in Snuba.
This PR is preparing postgres to stream updates via the replication log.
The idea is to
download the the replication log plugin binary during install.sh
mount a volume with the binary when starting postgres
providing a new entrypoint to postgres that ensures everything is correctly configured.
There is a difference between how this is set up and how we do the same in the development environment.
In the development environment we download the library from the entrypoint itself and store it in a persistent volume, so we do not have to download it every time.
Unfortunately this does not work here as the postgres image is postgres:9.6 while it is postgres:9.6-alpine. This one does not come with either wget or curl. I don't think installing that in the entrypoint would be a good idea, so the download happens in install.sh. I actually think this way is safer so we never depend on connectivity for postgres to start properly.