Feed Advanced Search

Advanced Search
Use Results
Edit Query
Hide Query

	Include stories about projects I am a member of.

Feb 18 2022

vsellier closed D7201: azure: upgrade definitions for last terraform and azurerm versions.

Feb 18 2022, 3:40 PM

vsellier committed rSPREb46aa75dd621: azure: upgrade definitions for last terraform and azurerm versions (authored by vsellier).

azure: upgrade definitions for last terraform and azurerm versions

Feb 18 2022, 3:40 PM

vsellier added a revision to T3903: Clean up unused azure vms or services: D7201: azure: upgrade definitions for last terraform and azurerm versions.

Feb 18 2022, 3:36 PM · System administration

vsellier requested review of D7201: azure: upgrade definitions for last terraform and azurerm versions.

Feb 18 2022, 3:36 PM

Feb 17 2022

vsellier added a comment to T3784: swh-search / staging: transient timeouts on elasticsearch queries.

looks like the server is short in heap

[2022-02-17T15:26:30,847][INFO ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][5965188] overhead, spent [408ms] collecting in the last [1s]
[2022-02-17T15:27:08,154][INFO ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][5965225] overhead, spent [296ms] collecting in the last [1s]
[2022-02-17T15:29:31,383][WARN ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][young][5965368][3283] duration [1s], collections [1]/[1.1s], total [1s]/[5.8m], memory [8.2gb]->[5.4gb]/[16gb], all_pools {[young] [2.8gb]->[0b]/[0b]}{[old] [4.7gb]->[5.3gb]/[16gb]}{[survivor] [652mb]->[184mb]/[0b]}
[2022-02-17T15:29:31,384][WARN ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][5965368] overhead, spent [1s] collecting in the last [1.1s]
[2022-02-17T15:31:49,449][INFO ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][5965506] overhead, spent [260ms] collecting in the last [1s]
[2022-02-17T15:33:46,505][INFO ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][5965623] overhead, spent [256ms] collecting in the last [1s]
[2022-02-17T15:37:11,728][INFO ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][5965828] overhead, spent [372ms] collecting in the last [1s]
[2022-02-17T15:47:19,087][INFO ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][5966435] overhead, spent [289ms] collecting in the last [1s]
[2022-02-17T15:49:56,439][INFO ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][5966592] overhead, spent [315ms] collecting in the last [1.1s]
[2022-02-17T15:55:40,579][INFO ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][5966936] overhead, spent [274ms] collecting in the last [1s]

Feb 17 2022, 5:17 PM · Archive search, System administration

vsellier closed D7180: zfs sync: Add the staging objects dataset replication to db1.

closed by rSPSITE16b929369b1967718da97b71f5af5949721b9578

Feb 17 2022, 5:01 PM

vsellier closed D7179: zfs sync: configure the staging kafka replication to db1.

closed by rSPSITEe3d6d0dfc00d339529d68227954229c7e7b6b1aa

Feb 17 2022, 5:01 PM

vsellier added a comment to T3911: Cross replicate the staging storage between db1 and storage1.

Objects replication:

land D7180
run puppet on db1 and storage1
the sync automatically starts:

Feb 17 15:41:22 db1 systemd[1]: Starting ZFS dataset synchronization of...
Feb 17 15:41:23 db1 syncoid[283583]: INFO: Sending oldest full snapshot data/objects@syncoid_db1_2022-02-17:15:41:23 (~ 11811.3 GB) to new target filesystem:

It will take some time to complete.

Feb 17 2022, 4:59 PM · System administration

vsellier committed rSPSITE16b929369b19: zfs sync: Add the staging objects dataset replication to db1 (authored by vsellier).

zfs sync: Add the staging objects dataset replication to db1

Feb 17 2022, 4:34 PM

vsellier committed rSPPRIVC4ab1b5374a4d: add the syncoid::ssh_key::db1 key (authored by vsellier).

add the syncoid::ssh_key::db1 key

Feb 17 2022, 2:01 PM

vsellier committed rSPSITEe3d6d0dfc00d: zfs sync: configure the staging kafka replication to db1 (authored by vsellier).

zfs sync: configure the staging kafka replication to db1

Feb 17 2022, 1:56 PM

vsellier added a comment to T3911: Cross replicate the staging storage between db1 and storage1.

kafka data replication:

prepare the dataset (ensure there is no mouts this time)

root@db1:~# zfs create -o canmount=noauto -o mountpoint=none data/sync
root@db1:~# zfs create -o canmount=noauto -o mountpoint=none data/sync/storage1
root@db1:~# zfs list
NAME                         USED  AVAIL     REFER  MOUNTPOINT
data                         736G  25.7T       96K  /data
data/postgres-indexer-12      96K  25.7T       96K  /srv/softwareheritage/postgres/12/indexer
data/postgres-main-12        733G  25.7T      729G  /srv/softwareheritage/postgres/12/main
data/postgres-misc           112K  25.7T      112K  /srv/softwareheritage/postgres
data/postgres-secondary-12    96K  25.7T       96K  /srv/softwareheritage/postgres/12/secondary
data/sync                    192K  25.7T       96K  none
data/sync/storage1            96K  25.7T       96K  none

land D7179
run puppet en db1 and storage
initial synchronization started:

Feb 17 13:05:09 db1 syncoid[999999]: INFO: Sending oldest full snapshot data/kafka@syncoid_db1_2022-02-17:13:05:09 (~ 1686.6 GB) to new target filesystem:

Feb 17 2022, 1:53 PM · System administration

vsellier added a comment to T3942: borg issues on multiple nodes.

Yes, my bad, it's due to T3911.

Feb 17 2022, 1:15 PM · System administration

Feb 15 2022

vsellier added a revision to T3911: Cross replicate the staging storage between db1 and storage1: D7180: zfs sync: Add the staging objects dataset replication to db1.

Feb 15 2022, 4:28 PM · System administration

vsellier requested review of D7180: zfs sync: Add the staging objects dataset replication to db1.

Feb 15 2022, 4:28 PM

vsellier requested review of D7179: zfs sync: configure the staging kafka replication to db1.

Feb 15 2022, 4:18 PM

vsellier added a revision to T3911: Cross replicate the staging storage between db1 and storage1: D7179: zfs sync: configure the staging kafka replication to db1.

Feb 15 2022, 4:18 PM · System administration

vsellier added a comment to T3911: Cross replicate the staging storage between db1 and storage1.

The initial synchronization took 2h20
After a stabilization period, the synchronization is done every 5mn and takes ~1mn (the sizes are logged uncompressed and must be / by ~2.5 to have the real size)

Feb 15 2022, 3:08 PM · System administration

vsellier added a comment to T3911: Cross replicate the staging storage between db1 and storage1.

D7173 landed. It initially focuses on the db1 -> storage1 replication to avoid having several initial replication at the same time. the storage1 -> db1 replication will be configured after the initial db1 replication will be done.
The replication will be done in this way (initiated by storage1):
db1 dataset data/postgres-main-12 replicated on storage1 /data/sync/db1/postgresql-main-12

Feb 15 2022, 12:07 PM · System administration

vsellier committed rSPPRIVCd1b60989bc54: Add swh::deploy::loader_bzr::sentry_token key (authored by vsellier).

Add swh::deploy::loader_bzr::sentry_token key

Feb 15 2022, 11:38 AM

vsellier committed rSPPRIVC23c4e24ec016: Add syncoid::ssh_key::storage1 (authored by vsellier).

Add syncoid::ssh_key::storage1

Feb 15 2022, 11:38 AM

vsellier committed rSPSITE87171b61695c: sanoid: configure the db1 -> storage1 zfs replication (authored by vsellier).

sanoid: configure the db1 -> storage1 zfs replication

Feb 15 2022, 11:36 AM

vsellier closed D7173: sanoid: configure the db1 -> storage1 zfs replication.

Feb 15 2022, 11:35 AM

vsellier committed rSPSITEbe3c146e8812: sanoid: prepare the server to server zfs replication (authored by vsellier).

sanoid: prepare the server to server zfs replication

Feb 15 2022, 11:35 AM

vsellier updated the diff for D7173: sanoid: configure the db1 -> storage1 zfs replication.

fix the doc of the key name computation

Feb 15 2022, 11:30 AM

vsellier added inline comments to D7173: sanoid: configure the db1 -> storage1 zfs replication.

Feb 15 2022, 10:55 AM

Feb 14 2022

vsellier retitled D7173: sanoid: configure the db1 -> storage1 zfs replication from sanoid: configure the db1 -> storage1 zfs replication 2 commits: - first: sanoid: prepare the server to server zfs replication to sanoid: configure the db1 -> storage1 zfs replication.

Feb 14 2022, 7:57 PM

vsellier requested review of D7173: sanoid: configure the db1 -> storage1 zfs replication.

Feb 14 2022, 7:56 PM

vsellier added a revision to T3911: Cross replicate the staging storage between db1 and storage1: D7173: sanoid: configure the db1 -> storage1 zfs replication.

Feb 14 2022, 7:56 PM · System administration

Feb 11 2022

vsellier committed rSENVf87990fd3fc8: Update the debian version of the migrated vms (authored by vsellier).

Update the debian version of the migrated vms

Feb 11 2022, 6:54 PM

vsellier accepted D7158: keycloak: Remove realm direct grant flow override.

Feb 11 2022, 3:23 PM

vsellier created P1285 keycloak error.

Feb 11 2022, 2:51 PM

Feb 10 2022

vsellier changed the status of T3911: Cross replicate the staging storage between db1 and storage1 from Open to Work in Progress.

Feb 10 2022, 2:35 PM · System administration

vsellier accepted D7141: provenance: Give some permissions to provenance team.

Feb 10 2022, 2:23 PM

vsellier committed rSENV117a1686fe53: Add new servers facts (authored by vsellier).

Add new servers facts

Feb 10 2022, 11:29 AM

vsellier closed D7136: icinga: don't try to monitor directories under the postgresql datadir.

Feb 10 2022, 11:19 AM

vsellier committed rSPSITE27e269717fbd: icinga: don't try to monitor directories under the postgresql datadir (authored by vsellier).

icinga: don't try to monitor directories under the postgresql datadir

Feb 10 2022, 11:19 AM

Feb 9 2022

vsellier requested review of D7136: icinga: don't try to monitor directories under the postgresql datadir.

Feb 9 2022, 4:45 PM

vsellier added a revision to T3889: Admin database backup: D7136: icinga: don't try to monitor directories under the postgresql datadir.

Feb 9 2022, 4:45 PM · System administration

Feb 8 2022

vsellier added a comment to T3889: Admin database backup.

the first local snapshots worked:

root@dali:~# zfs list -t all
NAME                                                       USED  AVAIL     REFER  MOUNTPOINT
data                                                      66.7G   126G       24K  /data
data/postgresql                                           66.6G   126G     66.6G  /srv/postgresql/14/main
data/postgresql@autosnap_2022-02-08_19:04:44_monthly      1.47M      -     66.6G  -
data/postgresql@autosnap_2022-02-08_19:04:44_daily         194K      -     66.6G  -
data/postgresql/wal                                       31.8M   126G     14.9M  /srv/postgresql/14/main/pg_wal
data/postgresql/wal@autosnap_2022-02-08_19:04:44_monthly  16.3M      -     31.3M  -
data/postgresql/wal@autosnap_2022-02-08_19:04:44_daily      13K      -     15.0M  -

Feb 8 2022, 8:11 PM · System administration

vsellier closed D7118: backups: implements a zfs snapshot backup.

Feb 8 2022, 8:01 PM

vsellier committed rSPSITE9300ba9a5783: backups: implements a postgresql backup based on zfs snapshots (authored by vsellier).

backups: implements a postgresql backup based on zfs snapshots

Feb 8 2022, 8:01 PM

vsellier updated the diff for D7118: backups: implements a zfs snapshot backup.

rebase

Feb 8 2022, 8:01 PM

vsellier added a comment to T3889: Admin database backup.

The dali database directory tree was prepared to have a dedicated mount dataset for the wals:

root@dali:~# date
Tue Feb  8 18:48:57 UTC 2022
root@dali:~# systemctl stop postgresql@14-main
● postgresql@14-main.service - PostgreSQL Cluster 14-main
     Loaded: loaded (/lib/systemd/system/postgresql@.service; enabled-runtime; vendor preset: enabled)
     Active: inactive (dead) since Tue 2022-02-08 18:48:58 UTC; 5ms ago
    Process: 2705743 ExecStop=/usr/bin/pg_ctlcluster --skip-systemctl-redirect -m fast 14-main stop (code=exited, status=0/SUCCESS)
   Main PID: 31293 (code=exited, status=0/SUCCESS)
        CPU: 1d 6h 12min 2.894s

Feb 8 2022, 7:55 PM · System administration

vsellier updated the test plan for D7118: backups: implements a zfs snapshot backup.

Feb 8 2022, 4:30 PM

vsellier updated the diff for D7118: backups: implements a zfs snapshot backup.

use a template instead of stdlib::to_toml function not compatible with puppet 5

Feb 8 2022, 4:29 PM

vsellier accepted D7123: Configure vault cookers to send their issue to sentry.

Feb 8 2022, 4:22 PM

vsellier added a comment to D7118: backups: implements a zfs snapshot backup.

thanks, I will fix that.

Feb 8 2022, 3:02 PM

vsellier committed rSENV331e3b73f650: vagrant: declare saam node (authored by vsellier).

vagrant: declare saam node

Feb 8 2022, 2:20 PM

vsellier accepted D7112: Deploy swh-worker@loader_bzr service to staging workers.

Feb 8 2022, 2:16 PM

vsellier updated the diff for D7118: backups: implements a zfs snapshot backup.

update commit message

Feb 8 2022, 2:12 PM

vsellier updated the summary of D7118: backups: implements a zfs snapshot backup.

Feb 8 2022, 2:11 PM

vsellier updated the summary of D7118: backups: implements a zfs snapshot backup.

Feb 8 2022, 2:11 PM

vsellier retitled D7118: backups: implements a zfs snapshot backup from WIP backups: implements a zfs snapshot backup to backups: implements a zfs snapshot backup.

Feb 8 2022, 2:11 PM

vsellier updated the diff for D7118: backups: implements a zfs snapshot backup.

add the postgresql backup management script
ensure the snapshot of the wal is done after the postgresql snapshot

Feb 8 2022, 2:04 PM

vsellier updated the diff for D7118: backups: implements a zfs snapshot backup.

Update to only keep the local snapshot section.
The sync deployment will be implemented in another diff.

Feb 8 2022, 11:20 AM

vsellier planned changes to D7118: backups: implements a zfs snapshot backup.

Feb 8 2022, 11:14 AM

vsellier requested review of D7118: backups: implements a zfs snapshot backup.

Feb 8 2022, 11:14 AM

vsellier added a revision to T3889: Admin database backup: D7118: backups: implements a zfs snapshot backup.

Feb 8 2022, 11:14 AM · System administration

Feb 7 2022

vsellier closed D7110: sysadm: add a postgresql backup management section.

Feb 7 2022, 5:17 PM

vsellier committed rDDOCee054f4c41db: sysadm: add a postgresql backup management section (authored by vsellier).

sysadm: add a postgresql backup management section

Feb 7 2022, 5:17 PM

vsellier renamed T3911: Cross replicate the staging storage between db1 and storage1 from Replicate the staging storage between db1 and storage1 to Cross replicate the staging storage between db1 and storage1.

Feb 7 2022, 10:29 AM · System administration

vsellier triaged T3911: Cross replicate the staging storage between db1 and storage1 as Normal priority.

Feb 7 2022, 10:29 AM · System administration

vsellier closed T2733: Explore / install a varnish prometheus probe as Resolved.

the exporter is deployed.
The varnish stats are available on this dashboard: https://grafana.softwareheritage.org/d/pE2xMZank/varnish

Feb 7 2022, 9:00 AM · Metrics/monitoring, System administration

Feb 4 2022

vsellier committed rSPSITEc7f2d377d52b: Avoid saam to declare global mountpoints not matching its configuration (authored by vsellier).

Avoid saam to declare global mountpoints not matching its configuration

Feb 4 2022, 9:47 AM

vsellier closed D7081: Avoid saam to declare global mountpoints not matching its configuration.

Feb 4 2022, 9:47 AM

vsellier requested review of D7081: Avoid saam to declare global mountpoints not matching its configuration.

Feb 4 2022, 9:41 AM

vsellier closed D7079: varnish: export the metrics to prometheus.

Feb 4 2022, 9:10 AM

vsellier committed rSPSITE29566b79693e: varnish: export the metrics to prometheus (authored by vsellier).

varnish: export the metrics to prometheus

Feb 4 2022, 9:09 AM

Feb 3 2022

vsellier requested review of D7079: varnish: export the metrics to prometheus.

Feb 3 2022, 7:04 PM

vsellier added a revision to T2733: Explore / install a varnish prometheus probe: D7079: varnish: export the metrics to prometheus.

Feb 3 2022, 7:04 PM · Metrics/monitoring, System administration

vsellier changed the status of T2733: Explore / install a varnish prometheus probe from Open to Work in Progress.

Feb 3 2022, 7:01 PM · Metrics/monitoring, System administration

vsellier closed D7075: nginx: add the configuration to retrieve the metrics in prometheus.

Feb 3 2022, 4:14 PM

vsellier committed rSPSITE109591ae1f3a: nginx: add the configuration to retrieve the metrics in prometheus (authored by vsellier).

nginx: add the configuration to retrieve the metrics in prometheus

Feb 3 2022, 4:14 PM

vsellier updated the diff for D7075: nginx: add the configuration to retrieve the metrics in prometheus.

use -- for all the options of the exporter configuration

Feb 3 2022, 4:10 PM

vsellier updated the diff for D7075: nginx: add the configuration to retrieve the metrics in prometheus.

minor update on documentation

Feb 3 2022, 4:05 PM

vsellier updated the summary of D7075: nginx: add the configuration to retrieve the metrics in prometheus.

Feb 3 2022, 4:02 PM

vsellier requested review of D7075: nginx: add the configuration to retrieve the metrics in prometheus.

Feb 3 2022, 3:55 PM

vsellier closed T3899: Clean nfs mountpoint on workers as Resolved.

Feb 3 2022, 12:02 PM · System administration

vsellier added a comment to T3899: Clean nfs mountpoint on workers.

D7068 deployed and applied on the workers:

root@pergamon:/etc/clustershell# clush -b -w @workers -w worker17 -w worker18 "set -e; puppet agent --test"
clush:  0/31
clush: in progress(31): worker[01-18],worker[01-13].euwest.azure
---------------
worker01.euwest.azure
---------------
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Retrieving locales
Info: Loading facts
Info: Caching catalog for worker01.euwest.azure.internal.softwareheritage.org
Info: Applying configuration version '1643885189'
Notice: Applied catalog in 11.65 seconds
...
------------
worker18
---------------
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Retrieving locales
Info: Loading facts
Info: Caching catalog for worker18.softwareheritage.org
Info: Applying configuration version '1643885204'
Notice: /Stage[main]/Profile::Mountpoints/Mount[/srv/storage/space]/options: options changed 'rw,soft,intr,rsize=8192,wsize=8192,noauto,x-systemd.automount,x-systemd.device-timeout=10' to 'ro,soft,intr,rsize=8192,wsize=8192,noauto,x-systemd.automount,x-systemd.device-timeout=10'
Info: Computing checksum on file /etc/fstab
Info: /Stage[main]/Profile::Mountpoints/Mount[/srv/storage/space]: Scheduling refresh of Mount[/srv/storage/space]
Info: Mount[/srv/storage/space](provider=parsed): Remounting
Notice: /Stage[main]/Profile::Mountpoints/Mount[/srv/storage/space]: Triggered 'refresh' from 1 event
Info: /Stage[main]/Profile::Mountpoints/Mount[/srv/storage/space]: Scheduling refresh of Mount[/srv/storage/space]
Notice: Applied catalog in 19.67 seconds
clush: worker[01-18] (18): exited with exit code 2

Feb 3 2022, 12:01 PM · System administration

vsellier closed D7068: mountpoints: remove useless default mountpoints.

Feb 3 2022, 11:43 AM

vsellier committed rSPSITE37739f1543ea: mountpoints: remove useless default mountpoints (authored by vsellier).

mountpoints: remove useless default mountpoints

Feb 3 2022, 11:43 AM

vsellier updated the diff for D7068: mountpoints: remove useless default mountpoints.

completely remove the mountpoint to remove as the mount class
is not doing the cleanup when it's declared as absent.

Feb 3 2022, 11:42 AM

vsellier requested review of D7068: mountpoints: remove useless default mountpoints.

Feb 3 2022, 11:32 AM

vsellier added a revision to T3899: Clean nfs mountpoint on workers: D7068: mountpoints: remove useless default mountpoints.