Page MenuHomeSoftware Heritage
Feed Advanced Search

Feb 18 2022

vsellier closed D7201: azure: upgrade definitions for last terraform and azurerm versions.
Feb 18 2022, 3:40 PM
vsellier committed rSPREb46aa75dd621: azure: upgrade definitions for last terraform and azurerm versions (authored by vsellier).
azure: upgrade definitions for last terraform and azurerm versions
Feb 18 2022, 3:40 PM
vsellier added a revision to T3903: Clean up unused azure vms or services: D7201: azure: upgrade definitions for last terraform and azurerm versions.
Feb 18 2022, 3:36 PM · System administration
vsellier requested review of D7201: azure: upgrade definitions for last terraform and azurerm versions.
Feb 18 2022, 3:36 PM

Feb 17 2022

vsellier added a comment to T3784: swh-search / staging: transient timeouts on elasticsearch queries.

looks like the server is short in heap

[2022-02-17T15:26:30,847][INFO ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][5965188] overhead, spent [408ms] collecting in the last [1s]
[2022-02-17T15:27:08,154][INFO ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][5965225] overhead, spent [296ms] collecting in the last [1s]
[2022-02-17T15:29:31,383][WARN ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][young][5965368][3283] duration [1s], collections [1]/[1.1s], total [1s]/[5.8m], memory [8.2gb]->[5.4gb]/[16gb], all_pools {[young] [2.8gb]->[0b]/[0b]}{[old] [4.7gb]->[5.3gb]/[16gb]}{[survivor] [652mb]->[184mb]/[0b]}
[2022-02-17T15:29:31,384][WARN ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][5965368] overhead, spent [1s] collecting in the last [1.1s]
[2022-02-17T15:31:49,449][INFO ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][5965506] overhead, spent [260ms] collecting in the last [1s]
[2022-02-17T15:33:46,505][INFO ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][5965623] overhead, spent [256ms] collecting in the last [1s]
[2022-02-17T15:37:11,728][INFO ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][5965828] overhead, spent [372ms] collecting in the last [1s]
[2022-02-17T15:47:19,087][INFO ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][5966435] overhead, spent [289ms] collecting in the last [1s]
[2022-02-17T15:49:56,439][INFO ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][5966592] overhead, spent [315ms] collecting in the last [1.1s]
[2022-02-17T15:55:40,579][INFO ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][5966936] overhead, spent [274ms] collecting in the last [1s]
Feb 17 2022, 5:17 PM · Archive search, System administration
vsellier closed D7180: zfs sync: Add the staging objects dataset replication to db1.

closed by rSPSITE16b929369b1967718da97b71f5af5949721b9578

Feb 17 2022, 5:01 PM
vsellier closed D7179: zfs sync: configure the staging kafka replication to db1.

closed by rSPSITEe3d6d0dfc00d339529d68227954229c7e7b6b1aa

Feb 17 2022, 5:01 PM
vsellier added a comment to T3911: Cross replicate the staging storage between db1 and storage1.

Objects replication:

  • land D7180
  • run puppet on db1 and storage1
  • the sync automatically starts:
Feb 17 15:41:22 db1 systemd[1]: Starting ZFS dataset synchronization of...
Feb 17 15:41:23 db1 syncoid[283583]: INFO: Sending oldest full snapshot data/objects@syncoid_db1_2022-02-17:15:41:23 (~ 11811.3 GB) to new target filesystem:

It will take some time to complete.

Feb 17 2022, 4:59 PM · System administration
vsellier committed rSPSITE16b929369b19: zfs sync: Add the staging objects dataset replication to db1 (authored by vsellier).
zfs sync: Add the staging objects dataset replication to db1
Feb 17 2022, 4:34 PM
vsellier committed rSPPRIVC4ab1b5374a4d: add the syncoid::ssh_key::db1 key (authored by vsellier).
add the syncoid::ssh_key::db1 key
Feb 17 2022, 2:01 PM
vsellier committed rSPSITEe3d6d0dfc00d: zfs sync: configure the staging kafka replication to db1 (authored by vsellier).
zfs sync: configure the staging kafka replication to db1
Feb 17 2022, 1:56 PM
vsellier added a comment to T3911: Cross replicate the staging storage between db1 and storage1.

kafka data replication:

  • prepare the dataset (ensure there is no mouts this time)
root@db1:~# zfs create -o canmount=noauto -o mountpoint=none data/sync
root@db1:~# zfs create -o canmount=noauto -o mountpoint=none data/sync/storage1
root@db1:~# zfs list
NAME                         USED  AVAIL     REFER  MOUNTPOINT
data                         736G  25.7T       96K  /data
data/postgres-indexer-12      96K  25.7T       96K  /srv/softwareheritage/postgres/12/indexer
data/postgres-main-12        733G  25.7T      729G  /srv/softwareheritage/postgres/12/main
data/postgres-misc           112K  25.7T      112K  /srv/softwareheritage/postgres
data/postgres-secondary-12    96K  25.7T       96K  /srv/softwareheritage/postgres/12/secondary
data/sync                    192K  25.7T       96K  none
data/sync/storage1            96K  25.7T       96K  none
  • land D7179
  • run puppet en db1 and storage
  • initial synchronization started:
Feb 17 13:05:09 db1 syncoid[999999]: INFO: Sending oldest full snapshot data/kafka@syncoid_db1_2022-02-17:13:05:09 (~ 1686.6 GB) to new target filesystem:
Feb 17 2022, 1:53 PM · System administration
vsellier added a comment to T3942: borg issues on multiple nodes.

Yes, my bad, it's due to T3911.

Feb 17 2022, 1:15 PM · System administration

Feb 15 2022

vsellier added a revision to T3911: Cross replicate the staging storage between db1 and storage1: D7180: zfs sync: Add the staging objects dataset replication to db1.
Feb 15 2022, 4:28 PM · System administration
vsellier requested review of D7180: zfs sync: Add the staging objects dataset replication to db1.
Feb 15 2022, 4:28 PM
vsellier requested review of D7179: zfs sync: configure the staging kafka replication to db1.
Feb 15 2022, 4:18 PM
vsellier added a revision to T3911: Cross replicate the staging storage between db1 and storage1: D7179: zfs sync: configure the staging kafka replication to db1.
Feb 15 2022, 4:18 PM · System administration
vsellier added a comment to T3911: Cross replicate the staging storage between db1 and storage1.
  • The initial synchronization took 2h20
  • After a stabilization period, the synchronization is done every 5mn and takes ~1mn (the sizes are logged uncompressed and must be / by ~2.5 to have the real size)
Feb 15 2022, 3:08 PM · System administration
vsellier added a comment to T3911: Cross replicate the staging storage between db1 and storage1.

D7173 landed. It initially focuses on the db1 -> storage1 replication to avoid having several initial replication at the same time. the storage1 -> db1 replication will be configured after the initial db1 replication will be done.
The replication will be done in this way (initiated by storage1):
db1 dataset data/postgres-main-12 replicated on storage1 /data/sync/db1/postgresql-main-12

Feb 15 2022, 12:07 PM · System administration
vsellier committed rSPPRIVCd1b60989bc54: Add swh::deploy::loader_bzr::sentry_token key (authored by vsellier).
Add swh::deploy::loader_bzr::sentry_token key
Feb 15 2022, 11:38 AM
vsellier committed rSPPRIVC23c4e24ec016: Add syncoid::ssh_key::storage1 (authored by vsellier).
Add syncoid::ssh_key::storage1
Feb 15 2022, 11:38 AM
vsellier committed rSPSITE87171b61695c: sanoid: configure the db1 -> storage1 zfs replication (authored by vsellier).
sanoid: configure the db1 -> storage1 zfs replication
Feb 15 2022, 11:36 AM
vsellier closed D7173: sanoid: configure the db1 -> storage1 zfs replication.
Feb 15 2022, 11:35 AM
vsellier committed rSPSITEbe3c146e8812: sanoid: prepare the server to server zfs replication (authored by vsellier).
sanoid: prepare the server to server zfs replication
Feb 15 2022, 11:35 AM
vsellier updated the diff for D7173: sanoid: configure the db1 -> storage1 zfs replication.

fix the doc of the key name computation

Feb 15 2022, 11:30 AM
vsellier added inline comments to D7173: sanoid: configure the db1 -> storage1 zfs replication.
Feb 15 2022, 10:55 AM

Feb 14 2022

vsellier retitled D7173: sanoid: configure the db1 -> storage1 zfs replication from sanoid: configure the db1 -> storage1 zfs replication 2 commits: - first: sanoid: prepare the server to server zfs replication to sanoid: configure the db1 -> storage1 zfs replication.
Feb 14 2022, 7:57 PM
vsellier requested review of D7173: sanoid: configure the db1 -> storage1 zfs replication.
Feb 14 2022, 7:56 PM
vsellier added a revision to T3911: Cross replicate the staging storage between db1 and storage1: D7173: sanoid: configure the db1 -> storage1 zfs replication.
Feb 14 2022, 7:56 PM · System administration

Feb 11 2022

vsellier committed rSENVf87990fd3fc8: Update the debian version of the migrated vms (authored by vsellier).
Update the debian version of the migrated vms
Feb 11 2022, 6:54 PM
vsellier accepted D7158: keycloak: Remove realm direct grant flow override.
Feb 11 2022, 3:23 PM
vsellier created P1285 keycloak error.
Feb 11 2022, 2:51 PM

Feb 10 2022

vsellier changed the status of T3911: Cross replicate the staging storage between db1 and storage1 from Open to Work in Progress.
Feb 10 2022, 2:35 PM · System administration
vsellier accepted D7141: provenance: Give some permissions to provenance team.
Feb 10 2022, 2:23 PM
vsellier committed rSENV117a1686fe53: Add new servers facts (authored by vsellier).
Add new servers facts
Feb 10 2022, 11:29 AM
vsellier closed D7136: icinga: don't try to monitor directories under the postgresql datadir.
Feb 10 2022, 11:19 AM
vsellier committed rSPSITE27e269717fbd: icinga: don't try to monitor directories under the postgresql datadir (authored by vsellier).
icinga: don't try to monitor directories under the postgresql datadir
Feb 10 2022, 11:19 AM

Feb 9 2022

vsellier requested review of D7136: icinga: don't try to monitor directories under the postgresql datadir.
Feb 9 2022, 4:45 PM
vsellier added a revision to T3889: Admin database backup: D7136: icinga: don't try to monitor directories under the postgresql datadir.
Feb 9 2022, 4:45 PM · System administration

Feb 8 2022

vsellier added a comment to T3889: Admin database backup.

the first local snapshots worked:

root@dali:~# zfs list -t all
NAME                                                       USED  AVAIL     REFER  MOUNTPOINT
data                                                      66.7G   126G       24K  /data
data/postgresql                                           66.6G   126G     66.6G  /srv/postgresql/14/main
data/postgresql@autosnap_2022-02-08_19:04:44_monthly      1.47M      -     66.6G  -
data/postgresql@autosnap_2022-02-08_19:04:44_daily         194K      -     66.6G  -
data/postgresql/wal                                       31.8M   126G     14.9M  /srv/postgresql/14/main/pg_wal
data/postgresql/wal@autosnap_2022-02-08_19:04:44_monthly  16.3M      -     31.3M  -
data/postgresql/wal@autosnap_2022-02-08_19:04:44_daily      13K      -     15.0M  -
Feb 8 2022, 8:11 PM · System administration
vsellier closed D7118: backups: implements a zfs snapshot backup.
Feb 8 2022, 8:01 PM
vsellier committed rSPSITE9300ba9a5783: backups: implements a postgresql backup based on zfs snapshots (authored by vsellier).
backups: implements a postgresql backup based on zfs snapshots
Feb 8 2022, 8:01 PM
vsellier updated the diff for D7118: backups: implements a zfs snapshot backup.

rebase

Feb 8 2022, 8:01 PM
vsellier added a comment to T3889: Admin database backup.

The dali database directory tree was prepared to have a dedicated mount dataset for the wals:

root@dali:~# date
Tue Feb  8 18:48:57 UTC 2022
root@dali:~# systemctl stop postgresql@14-main
● postgresql@14-main.service - PostgreSQL Cluster 14-main
     Loaded: loaded (/lib/systemd/system/postgresql@.service; enabled-runtime; vendor preset: enabled)
     Active: inactive (dead) since Tue 2022-02-08 18:48:58 UTC; 5ms ago
    Process: 2705743 ExecStop=/usr/bin/pg_ctlcluster --skip-systemctl-redirect -m fast 14-main stop (code=exited, status=0/SUCCESS)
   Main PID: 31293 (code=exited, status=0/SUCCESS)
        CPU: 1d 6h 12min 2.894s
Feb 8 2022, 7:55 PM · System administration
vsellier updated the test plan for D7118: backups: implements a zfs snapshot backup.
Feb 8 2022, 4:30 PM
vsellier updated the diff for D7118: backups: implements a zfs snapshot backup.

use a template instead of stdlib::to_toml function not compatible with puppet 5

Feb 8 2022, 4:29 PM
vsellier accepted D7123: Configure vault cookers to send their issue to sentry.
Feb 8 2022, 4:22 PM
vsellier added a comment to D7118: backups: implements a zfs snapshot backup.

thanks, I will fix that.

Feb 8 2022, 3:02 PM
vsellier committed rSENV331e3b73f650: vagrant: declare saam node (authored by vsellier).
vagrant: declare saam node
Feb 8 2022, 2:20 PM
vsellier accepted D7112: Deploy swh-worker@loader_bzr service to staging workers.
Feb 8 2022, 2:16 PM
vsellier updated the diff for D7118: backups: implements a zfs snapshot backup.

update commit message

Feb 8 2022, 2:12 PM
vsellier updated the summary of D7118: backups: implements a zfs snapshot backup.
Feb 8 2022, 2:11 PM
vsellier updated the summary of D7118: backups: implements a zfs snapshot backup.
Feb 8 2022, 2:11 PM
vsellier retitled D7118: backups: implements a zfs snapshot backup from WIP backups: implements a zfs snapshot backup to backups: implements a zfs snapshot backup.
Feb 8 2022, 2:11 PM
vsellier updated the diff for D7118: backups: implements a zfs snapshot backup.
  • add the postgresql backup management script
  • ensure the snapshot of the wal is done after the postgresql snapshot
Feb 8 2022, 2:04 PM
vsellier updated the diff for D7118: backups: implements a zfs snapshot backup.

Update to only keep the local snapshot section.
The sync deployment will be implemented in another diff.

Feb 8 2022, 11:20 AM
vsellier planned changes to D7118: backups: implements a zfs snapshot backup.
Feb 8 2022, 11:14 AM
vsellier requested review of D7118: backups: implements a zfs snapshot backup.
Feb 8 2022, 11:14 AM
vsellier added a revision to T3889: Admin database backup: D7118: backups: implements a zfs snapshot backup.
Feb 8 2022, 11:14 AM · System administration

Feb 7 2022

vsellier closed D7110: sysadm: add a postgresql backup management section.
Feb 7 2022, 5:17 PM
vsellier committed rDDOCee054f4c41db: sysadm: add a postgresql backup management section (authored by vsellier).
sysadm: add a postgresql backup management section
Feb 7 2022, 5:17 PM
vsellier renamed T3911: Cross replicate the staging storage between db1 and storage1 from Replicate the staging storage between db1 and storage1 to Cross replicate the staging storage between db1 and storage1.
Feb 7 2022, 10:29 AM · System administration
vsellier triaged T3911: Cross replicate the staging storage between db1 and storage1 as Normal priority.
Feb 7 2022, 10:29 AM · System administration
vsellier closed T2733: Explore / install a varnish prometheus probe as Resolved.

the exporter is deployed.
The varnish stats are available on this dashboard: https://grafana.softwareheritage.org/d/pE2xMZank/varnish

Feb 7 2022, 9:00 AM · Metrics/monitoring, System administration

Feb 4 2022

vsellier committed rSPSITEc7f2d377d52b: Avoid saam to declare global mountpoints not matching its configuration (authored by vsellier).
Avoid saam to declare global mountpoints not matching its configuration
Feb 4 2022, 9:47 AM
vsellier closed D7081: Avoid saam to declare global mountpoints not matching its configuration.
Feb 4 2022, 9:47 AM
vsellier requested review of D7081: Avoid saam to declare global mountpoints not matching its configuration.
Feb 4 2022, 9:41 AM
vsellier closed D7079: varnish: export the metrics to prometheus.
Feb 4 2022, 9:10 AM
vsellier committed rSPSITE29566b79693e: varnish: export the metrics to prometheus (authored by vsellier).
varnish: export the metrics to prometheus
Feb 4 2022, 9:09 AM

Feb 3 2022

vsellier requested review of D7079: varnish: export the metrics to prometheus.
Feb 3 2022, 7:04 PM
vsellier added a revision to T2733: Explore / install a varnish prometheus probe: D7079: varnish: export the metrics to prometheus.
Feb 3 2022, 7:04 PM · Metrics/monitoring, System administration
vsellier changed the status of T2733: Explore / install a varnish prometheus probe from Open to Work in Progress.
Feb 3 2022, 7:01 PM · Metrics/monitoring, System administration
vsellier closed D7075: nginx: add the configuration to retrieve the metrics in prometheus.
Feb 3 2022, 4:14 PM
vsellier committed rSPSITE109591ae1f3a: nginx: add the configuration to retrieve the metrics in prometheus (authored by vsellier).
nginx: add the configuration to retrieve the metrics in prometheus
Feb 3 2022, 4:14 PM
vsellier updated the diff for D7075: nginx: add the configuration to retrieve the metrics in prometheus.

use -- for all the options of the exporter configuration

Feb 3 2022, 4:10 PM
vsellier updated the diff for D7075: nginx: add the configuration to retrieve the metrics in prometheus.

minor update on documentation

Feb 3 2022, 4:05 PM
vsellier updated the summary of D7075: nginx: add the configuration to retrieve the metrics in prometheus.
Feb 3 2022, 4:02 PM
vsellier requested review of D7075: nginx: add the configuration to retrieve the metrics in prometheus.
Feb 3 2022, 3:55 PM
vsellier closed T3899: Clean nfs mountpoint on workers as Resolved.
Feb 3 2022, 12:02 PM · System administration
vsellier added a comment to T3899: Clean nfs mountpoint on workers.
  • D7068 deployed and applied on the workers:
root@pergamon:/etc/clustershell# clush -b -w @workers -w worker17 -w worker18 "set -e; puppet agent --test"
clush:  0/31
clush: in progress(31): worker[01-18],worker[01-13].euwest.azure
---------------
worker01.euwest.azure
---------------
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Retrieving locales
Info: Loading facts
Info: Caching catalog for worker01.euwest.azure.internal.softwareheritage.org
Info: Applying configuration version '1643885189'
Notice: Applied catalog in 11.65 seconds
...
------------
worker18
---------------
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Retrieving locales
Info: Loading facts
Info: Caching catalog for worker18.softwareheritage.org
Info: Applying configuration version '1643885204'
Notice: /Stage[main]/Profile::Mountpoints/Mount[/srv/storage/space]/options: options changed 'rw,soft,intr,rsize=8192,wsize=8192,noauto,x-systemd.automount,x-systemd.device-timeout=10' to 'ro,soft,intr,rsize=8192,wsize=8192,noauto,x-systemd.automount,x-systemd.device-timeout=10'
Info: Computing checksum on file /etc/fstab
Info: /Stage[main]/Profile::Mountpoints/Mount[/srv/storage/space]: Scheduling refresh of Mount[/srv/storage/space]
Info: Mount[/srv/storage/space](provider=parsed): Remounting
Notice: /Stage[main]/Profile::Mountpoints/Mount[/srv/storage/space]: Triggered 'refresh' from 1 event
Info: /Stage[main]/Profile::Mountpoints/Mount[/srv/storage/space]: Scheduling refresh of Mount[/srv/storage/space]
Notice: Applied catalog in 19.67 seconds
clush: worker[01-18] (18): exited with exit code 2
Feb 3 2022, 12:01 PM · System administration
vsellier closed D7068: mountpoints: remove useless default mountpoints.
Feb 3 2022, 11:43 AM
vsellier committed rSPSITE37739f1543ea: mountpoints: remove useless default mountpoints (authored by vsellier).
mountpoints: remove useless default mountpoints
Feb 3 2022, 11:43 AM
vsellier updated the diff for D7068: mountpoints: remove useless default mountpoints.

completely remove the mountpoint to remove as the mount class
is not doing the cleanup when it's declared as absent.

Feb 3 2022, 11:42 AM
vsellier requested review of D7068: mountpoints: remove useless default mountpoints.
Feb 3 2022, 11:32 AM
vsellier added a revision to T3899: Clean nfs mountpoint on workers: D7068: mountpoints: remove useless default mountpoints.
Feb 3 2022, 11:32 AM · System administration

Feb 2 2022

vsellier created P1271 scn timeouts.
Feb 2 2022, 11:04 AM

Feb 1 2022

vsellier committed rSPSITEe30059ef882c: icinga: Fix hedgedoc and read-only objstorage http checks (authored by vsellier).
icinga: Fix hedgedoc and read-only objstorage http checks
Feb 1 2022, 5:55 PM
vsellier accepted D7058: pergamon: Drop no longer relevant rewrite_rule about sentry.s.o.
Feb 1 2022, 5:45 PM
vsellier committed rSPSITE76ddf4d7db67: icinga: fix the webapp and r/o objstorage check url (authored by vsellier).
icinga: fix the webapp and r/o objstorage check url
Feb 1 2022, 5:07 PM
vsellier accepted D7045: Migrate sentry node to admin vlan.
Feb 1 2022, 3:31 PM
vsellier requested changes to D7045: Migrate sentry node to admin vlan.
Feb 1 2022, 2:37 PM
vsellier accepted D7055: storage01.euwest.azure: Only keep the gunicorn-swh-storage service.
Feb 1 2022, 2:31 PM

Jan 31 2022

vsellier added a comment to T3903: Clean up unused azure vms or services.

There are also:

Jan 31 2022, 6:48 PM · System administration
vsellier triaged T3902: Test linux gardener pro/cons as Normal priority.
Jan 31 2022, 6:04 PM · System administration
vsellier accepted D7049: weekly-planning: Use curl instead of httpie.
Jan 31 2022, 12:37 PM
vsellier moved T3899: Clean nfs mountpoint on workers from Backlog to in-progress on the System administration board.
Jan 31 2022, 12:33 PM · System administration
vsellier added a project to T3899: Clean nfs mountpoint on workers: System administration.
Jan 31 2022, 12:33 PM · System administration
vsellier changed the status of T3899: Clean nfs mountpoint on workers from Open to Work in Progress.
Jan 31 2022, 12:32 PM · System administration

Jan 28 2022

vsellier added a comment to D7021: Add graph dataset reading classes (orc+edges).

a few minor remarks

Jan 28 2022, 6:34 PM
vsellier planned changes to D6926: First iteration of prometheus export of the e2e metrics.
Jan 28 2022, 5:51 PM