Installation of the new provenance server
Closed, MigratedEdits Locked
Actions

Assigned To

Authored By

	vsellier
	Aug 18 2021, 9:45 AM

Description

[in progress] Provision the server in the inventory (rack position, idrac ips, ips, ...)
[Done] notify the dsi of the delivery + give the information for the installation
define the role in puppet
Install the server

List of packages amended along the way:

python3-virtualenvwrapper
libpq-dev
arcanist

Some more configuration:

install scripts ./create-db.sh and ./drop-db.sh to ease db maintenance for @aeviso to drop/create dbs

ardumont@met:~% cat drop-db.sh
#!/usr/bin/env bash

DBNAME=$1

sudo -i -u postgres dropdb $DBNAME
ardumont@met:~% cat create-db.sh
#!/usr/bin/env bash

DBNAME=$1

sudo -i -u postgres createdb -p 5433 --lc-ctype=C.UTF-8 -T template0 -O swh-provenance $DBNAME

Provision 10 dbs through puppet (so pgbouncer is configured as well) (D6378)
at some point, dedicate some vms to andres so he can experiment from those, passing through the internal network (without the vpn)

Revisions and Commits

rSPSITE puppet-swh-site
	D6406	rSPSITEefb36f766516 provenance: Configure the postgresql max_connections
	D6378	rSPSITEe42b581fc789 provenance: Declare 10 pre-provisioned databases for the different experiments
	D6365	rSPSITE530e01d211ff Adapt pgbouncer connection information for the provenance server
	D6364	rSPSITEd38d468c061c provenance: declare rabbitmq users
	D6363	rSPSITEed1297dd6955 Adapt postgresql connection information on the provenance server
	D6359	rSPSITEa35c3550f9a6 Prepare the configuration of the provenance server

Related Objects
Search...

Status	Assigned	Task
Migrated	gitlab-migration	T3487 Installation of the new provenance server
Migrated	gitlab-migration	T3615 Adapt rabbitmq monitoring for bullseye
Migrated	gitlab-migration	T3616 Create a prometheus-statsd-exporter package for bullseye
Migrated	gitlab-migration	T3617 Create a journalbeat package for bulleye

Event Timeline

vsellier changed the task status from Open to Work in Progress.Aug 18 2021, 9:45 AM

vsellier triaged this task as Normal priority.

vsellier created this task.

vsellier updated the task description. (Show Details)

@jayeshv @aeviso @douardda @olasd have you an idea of what should be installed on the server and who will operate what will be on it?

It's not completely clear for me if this server will be a sandbox/staging or a production server.

@vsellier I am not sure about this.
The idea is to use this machine as the production server. (I guess this will host either postgres or mongodb after we decide on a preferred backend. But that is going to take some time)
@olasd or @aeviso will know better.

vsellier updated the task description. (Show Details)Aug 19 2021, 12:29 PM

yes the idea is to have a beefy enough machine to perform full-size experiments on, that can then be (part of) the production infrastructure dedicated to the provenance index.

As see with @aeviso , we will install the following components on the server (the os will be debian11)

rabbitmq
postgresql:13
- a default swh-storage database will be managed by puppet
- 1000 parallel connections allowed
- shared_buffers 50go
docker

\*- WIP -*
Additional standard packages:

zfs, the datasets will be configured by sysadms
default statsd/prometheus exporter plugged on the main prometheus

In T3487#71230, @vsellier wrote:

postgresql:13

1000 parallel connections allowed

wouldn't it better to use pgbouncer (or similar)?

yes pgbouncer will be used and it's configured by default to 2000 // connections
I don't know the kind of load the provenance client will generate but the default 100 connections allowed by postgres will be probably too short and needed to be increased too

vsellier added a revision: D6359: Prepare the configuration of the provenance server.Sep 27 2021, 5:02 PM

vsellier mentioned this in rSENVfce61dca6c18: add provenance server.Sep 27 2021, 7:32 PM

vsellier added a commit: rSPSITEa35c3550f9a6: Prepare the configuration of the provenance server.Sep 27 2021, 7:35 PM

vsellier mentioned this in rSPSITE5a7dc21e8403: Fix database reference name.Sep 27 2021, 7:38 PM

vsellier mentioned this in rSPPRIVC44570ad137d7: Generate censored data from uncensored repository.Sep 27 2021, 7:45 PM

The server is installed. It remains few task to perform manually:

configure the zfs datasets (will configure 2 mirror pool for ~12To available, tell me if it's not what it's expected)
build few missing packages for bullseye (relative to the monitoring: prometheus-rabbitmq-exporter, prometheus-statsd-exporter, journalbeat)
configure a rabbitmq admin user

@aeviso you should be able to connect on the server.

The hostname is met.internal.softwareheritage.org

I forgot to mention there is a gift from dell on the server: an additional 600Go 10rpm disk

ardumont updated the task description. (Show Details)Sep 28 2021, 11:44 AM

The zfs pool and dataset are configured:

pool configuration

## nvme drives pool
#zpool create data mirror nvme-eui.36315030525005540025384500000003 nvme-eui.36315030525005800025384500000003 mirror nvme-eui.36315030525005620025384500000003 nvme-eui.36315030525005890025384500000003

## bonus pool
# zpool create data-hdd wwn-0x5000c500dea6c533

postgresql dataset

move the current postgresql content away and copy it on the new directory after

# zfs create -o mountpoint=/srv/softwareheritage/postgres/13/main -o atime=off -o relatime=on data/postgresql

status

root@met:~# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
data      11.6T  52.0M  11.6T        -         -     0%     0%  1.00x    ONLINE  -
data-hdd   556G   114K   556G        -         -     0%     0%  1.00x    ONLINE  -
root@met:~# zpool list -v
NAME                                            SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
data                                           11.6T  52.4M  11.6T        -         -     0%     0%  1.00x    ONLINE  -
  mirror                                       5.81T  26.5M  5.81T        -         -     0%  0.00%      -  ONLINE  
    nvme-eui.36315030525005540025384500000003      -      -      -        -         -      -      -      -  ONLINE  
    nvme-eui.36315030525005800025384500000003      -      -      -        -         -      -      -      -  ONLINE  
  mirror                                       5.81T  25.9M  5.81T        -         -     0%  0.00%      -  ONLINE  
    nvme-eui.36315030525005620025384500000003      -      -      -        -         -      -      -      -  ONLINE  
    nvme-eui.36315030525005890025384500000003      -      -      -        -         -      -      -      -  ONLINE  
data-hdd                                        556G   114K   556G        -         -     0%     0%  1.00x    ONLINE  -
  wwn-0x5000c500dea6c533                        556G   114K   556G        -         -     0%  0.00%      -  ONLINE

root@met:~# zfs list
NAME              USED  AVAIL     REFER  MOUNTPOINT
data             51.8M  11.3T       24K  /data
data-hdd          114K   539G       24K  /data-hdd
data/postgresql  51.6M  11.3T     51.6M  /srv/softwareheritage/postgres/13/main

ardumont moved this task from Backlog to Weekly backlog on the System administration board.Sep 28 2021, 11:47 AM

ardumont moved this task from Weekly backlog to in-progress on the System administration board.

ardumont updated the task description. (Show Details)Sep 28 2021, 12:02 PM

ardumont added a revision: D6363: Adapt postgresql connection information on the provenance server.Sep 28 2021, 2:35 PM

vsellier added a revision: D6364: provenance: declare rabbitmq users.Sep 28 2021, 2:36 PM

ardumont added a commit: rSPSITEed1297dd6955: Adapt postgresql connection information on the provenance server.Sep 28 2021, 2:39 PM

vsellier mentioned this in rSPPRIVC2362ec691041: Generate censored data from uncensored repository.Sep 28 2021, 2:47 PM

vsellier added a commit: rSPSITEd38d468c061c: provenance: declare rabbitmq users.

ardumont added a revision: D6365: Adapt postgresql connection information on the provenance server.Sep 28 2021, 2:49 PM

ardumont added a commit: rSPSITE530e01d211ff: Adapt pgbouncer connection information for the provenance server.Sep 28 2021, 2:55 PM

ardumont mentioned this in rSPSITEf02668a43c66: Fix met's correct ip address.Sep 28 2021, 3:08 PM

ardumont updated the task description. (Show Details)Sep 30 2021, 12:15 PM

vsellier added a revision: D6378: provenance: Declare 10 pre-provisioned databases for the different experiments.Sep 30 2021, 12:50 PM

vsellier updated the task description. (Show Details)Sep 30 2021, 12:50 PM

vsellier added a commit: rSPSITEe42b581fc789: provenance: Declare 10 pre-provisioned databases for the different experiments.Sep 30 2021, 2:23 PM

vsellier updated the task description. (Show Details)Sep 30 2021, 2:27 PM

vsellier added a revision: D6406: provenance: Configure the postgresql max_connections.Oct 4 2021, 5:59 PM

vsellier added a commit: rSPSITEefb36f766516: provenance: Configure the postgresql max_connections.Oct 5 2021, 10:56 AM

vsellier closed subtask T3615: Adapt rabbitmq monitoring for bullseye as Resolved.Oct 6 2021, 6:19 PM

rSPSITE6a233452cd48 fixed the prometheus node exporter.

I've cheated to pull journalbeat and prometheus-statsd-exporter in the bullseye repo: they're both go packages that were statically linked together, so I just had reprepro copy the buster binaries:

swhdebianrepo@pergamon:~$ reprepro -vb /srv/softwareheritage/repository copy bullseye-swh buster-swh journalbeat prometheus-statsd-exporter
Adding 'prometheus-statsd-exporter' '0.8.1-1~swh1~bpo10+1' to 'bullseye-swh|main|amd64'.
Adding 'journalbeat' '5.5.0+git20170727.1-1~swh+1~bpo10+1' to 'bullseye-swh|main|amd64'.
Adding 'prometheus-statsd-exporter' '0.8.1-1~swh1~bpo10+1' to 'bullseye-swh|main|source'.
Adding 'journalbeat' '5.5.0+git20170727.1-1~swh+1~bpo10+1' to 'bullseye-swh|main|source'.
Exporting indices...

Once we upgrade journalbeat to the upstream version, this can go away. Same for the statsd exporter. But it's GoodEnough™ for now.

*old comment not submitted*

install scripts ./create-db.sh and ./drop-db.sh to ease db maintenance for andres

ardumont@met:~% cat drop-db.sh
#!/usr/bin/env bash

DBNAME=$1

sudo -i -u postgres dropdb $DBNAME
ardumont@met:~% cat create-db.sh
#!/usr/bin/env bash

DBNAME=$1

sudo -i -u postgres createdb -p 5433 --lc-ctype=C.UTF-8 -T template0 -O swh-provenance $DBNAME

Provision 10 dbs through puppet
at some point, dedicate some vms to andres so he can experiment from those, passing through the internal network (without the vpn)

vsellier mentioned this in rSENV646f62805ef5: Add read-only storage self-signed certificate.Oct 8 2021, 2:20 PM

vsellier closed subtask T3617: Create a journalbeat package for bulleye as Resolved.Oct 11 2021, 1:23 PM

vsellier mentioned this in T3617: Create a journalbeat package for bulleye.

vsellier closed subtask T3616: Create a prometheus-statsd-exporter package for bullseye as Resolved.

vsellier mentioned this in T3616: Create a prometheus-statsd-exporter package for bullseye.

17:19:13     +olasd ╡ the postgresql tuning hasn't happened yet, afaict? effective_cache_size isn't set, and shared_buffers is tiny
17:19:46          ⤷ ╡ I'd bump shared_buffers to 128 GB and effective_cache_size to 256 GB, see where that gets you
17:20:19          ⤷ ╡ and probably maintenance_work_mem to something like 16 or 32 GB
17:20:54          ⤷ ╡ as well as random_page_cost to something lower like 1.5

The log is flooded with

2021-10-14 15:24:54.422 UTC [3951720] LOG:  checkpoints are occurring too frequently (28 seconds apart)
2021-10-14 15:24:54.422 UTC [3951720] HINT:  Consider increasing the configuration parameter "max_wal_size".

max_wal_size should be bumped to something more sensible like 32GB (needs a pg restart)

I've run alter system commands to bump these configuration variables in $DATADIR/postgresql.auto.conf, then ran a pg_reload_config():

2021-10-14 15:31:53.579 UTC [3951717] LOG:  received SIGHUP, reloading configuration files
2021-10-14 15:31:53.580 UTC [3951717] LOG:  parameter "max_wal_size" changed to "32GB"
2021-10-14 15:31:53.580 UTC [3951717] LOG:  parameter "effective_cache_size" changed to "256GB"
2021-10-14 15:31:53.580 UTC [3951717] LOG:  parameter "maintenance_work_mem" changed to "32GB"
2021-10-14 15:31:53.580 UTC [3951717] LOG:  parameter "shared_buffers" cannot be changed without restarting the server
2021-10-14 15:31:53.580 UTC [3951717] LOG:  parameter "random_page_cost" changed to "1.5"
2021-10-14 15:31:53.580 UTC [3951717] LOG:  configuration file "/srv/softwareheritage/postgres/13/main/postgresql.auto.conf" contains errors; unaffected changes were applied

olasd: I transfer you the ownership of this task as you manage the subject. Feel free to close the task if the installation can be considered as done.

ardumont mentioned this in T3833: Dedicate one admin host to centralize administration dbs.Jan 13 2022, 11:10 AM

olasd closed this task as Resolved.Jan 20 2022, 11:47 AM

ardumont moved this task from in-progress to done on the System administration board.May 23 2022, 2:27 PM

This task has been migrated to GitLab.

gitlab-migration changed the status of subtask T3615: Adapt rabbitmq monitoring for bullseye from Resolved to Migrated.Oct 19 2022, 6:04 PM

gitlab-migration changed the status of subtask T3616: Create a prometheus-statsd-exporter package for bullseye from Resolved to Migrated.

gitlab-migration changed the status of subtask T3617: Create a journalbeat package for bulleye from Resolved to Migrated.

Installation of the new provenance serverClosed, MigratedEdits LockedActions

Description

Revisions and Commits

Related ObjectsSearch...

Event Timeline

Installation of the new provenance server
Closed, MigratedEdits Locked
Actions

Related Objects
Search...