Page MenuHomeSoftware Heritage
Feed Advanced Search

Feb 24 2022

vsellier updated the summary of D7246: sanoid: Add the configuration to manage the snapshot retention on backup01.
Feb 24 2022, 4:35 PM
vsellier updated the diff for D7246: sanoid: Add the configuration to manage the snapshot retention on backup01.

update commit message

Feb 24 2022, 3:31 PM
vsellier added a revision to T3889: Admin database backup: D7246: sanoid: Add the configuration to manage the snapshot retention on backup01.
Feb 24 2022, 3:22 PM · System administration
vsellier requested review of D7246: sanoid: Add the configuration to manage the snapshot retention on backup01.
Feb 24 2022, 3:22 PM
vsellier closed D7239: syncoid: Fix wrong timer frequency variable name.
Feb 24 2022, 11:28 AM
vsellier committed rSPSITE2880bf612fdd: syncoid: Fix wrong timer frequency variable name (authored by vsellier).
syncoid: Fix wrong timer frequency variable name
Feb 24 2022, 11:28 AM
vsellier requested review of D7239: syncoid: Fix wrong timer frequency variable name.
Feb 24 2022, 11:27 AM
vsellier added a revision to T3889: Admin database backup: D7239: syncoid: Fix wrong timer frequency variable name.
Feb 24 2022, 11:27 AM · System administration
vsellier committed rSPPRIVC3df2781cc4ae: add the syncoid::ssh_key::backup01-azure key (authored by vsellier).
add the syncoid::ssh_key::backup01-azure key
Feb 24 2022, 10:49 AM
vsellier closed D7235: backup: copy the dali snapshots to the azure's backup vm.
Feb 24 2022, 10:47 AM
vsellier committed rSPSITEd94c24ea30b7: backup: copy the dali snapshots to the azure's backup vm (authored by vsellier).
backup: copy the dali snapshots to the azure's backup vm
Feb 24 2022, 10:47 AM
vsellier updated the diff for D7235: backup: copy the dali snapshots to the azure's backup vm.

avoid unnecessary update if the no-sync-snap is not specified

Feb 24 2022, 10:46 AM
vsellier committed rSENV93d29df72a4c: Declare backu01.euwest.azure node (authored by vsellier).
Declare backu01.euwest.azure node
Feb 24 2022, 9:10 AM
vsellier requested review of D7235: backup: copy the dali snapshots to the azure's backup vm.
Feb 24 2022, 9:08 AM
vsellier added a revision to T3889: Admin database backup: D7235: backup: copy the dali snapshots to the azure's backup vm.
Feb 24 2022, 9:08 AM · System administration

Feb 23 2022

vsellier added a comment to T3889: Admin database backup.
  • backup01 vm created on azure
  • zfs installed (will be reported in puppet):
    • add contrib repository
    • install zfs
# apt install linux-headers-cloud-amd64 zfs-dkms
  • configure zfs pool
root@backup01:~# fdisk /dev/sdc -l
Disk /dev/sdc: 200 GiB, 214748364800 bytes, 419430400 sectors
Disk model: Virtual Disk    
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: D0FB08C6-F046-F340-AC8B-D6C9372015D5
Feb 23 2022, 3:50 PM · System administration
vsellier updated the diff for D7220: backups: Add an azure backup vm.
  • Assign a static ip to not use an address in the middle of the workers
  • Ensure the data disk is not deleted in case of accidental removal of the vm
Feb 23 2022, 2:42 PM
vsellier updated the diff for D7220: backups: Add an azure backup vm.
  • Use a supported rsa key
  • fix the ssh-key provisioning
Feb 23 2022, 12:35 PM

Feb 22 2022

vsellier updated the diff for D7220: backups: Add an azure backup vm.

Update facts:

  • Remove the location entry
  • add the deployment variable
  • add the subnet variable
Feb 22 2022, 6:00 PM
vsellier requested review of D7220: backups: Add an azure backup vm.
Feb 22 2022, 2:53 PM
vsellier added a revision to T3889: Admin database backup: D7220: backups: Add an azure backup vm.
Feb 22 2022, 2:53 PM · System administration
vsellier added a comment to T3784: swh-search / staging: transient timeouts on elasticsearch queries.

After the elasticsearch restart, there is no more message relative to any gc overhead in the logs but there were a couple of timeouts during the night.
Further investigations are needed

Feb 22 2022, 11:20 AM · Archive search, System administration
vsellier closed T3968: Race condition when a zfs sync is started in the same second as Resolved.

A workaround is deployed to restart the sync if it was interrupted by a race condition scenario

Feb 22 2022, 10:39 AM · System administration
vsellier closed D7216: syncoid: Try to restart the synchronization if a race condition occurred.
Feb 22 2022, 10:32 AM
vsellier committed rSPSITE36a13a6410dd: syncoid: Try to restart the synchronization if a race condition occurred (authored by vsellier).
syncoid: Try to restart the synchronization if a race condition occurred
Feb 22 2022, 10:32 AM

Feb 21 2022

vsellier requested review of D7216: syncoid: Try to restart the synchronization if a race condition occurred.
Feb 21 2022, 6:30 PM
vsellier added a revision to T3968: Race condition when a zfs sync is started in the same second: D7216: syncoid: Try to restart the synchronization if a race condition occurred.
Feb 21 2022, 6:30 PM · System administration
vsellier moved T3968: Race condition when a zfs sync is started in the same second from Backlog to in-progress on the System administration board.
Feb 21 2022, 5:28 PM · System administration
vsellier changed the status of T3968: Race condition when a zfs sync is started in the same second from Open to Work in Progress.
Feb 21 2022, 5:27 PM · System administration
vsellier moved T3784: swh-search / staging: transient timeouts on elasticsearch queries from in-progress to deployed/landed/monitoring on the System administration board.
Feb 21 2022, 4:42 PM · Archive search, System administration
vsellier added a comment to T3784: swh-search / staging: transient timeouts on elasticsearch queries.

Elastisearch was restarted and the sentry issues closed.
Let's monitor if the gcs are coming coming again

Feb 21 2022, 4:42 PM · Archive search, System administration
vsellier added a comment to T3784: swh-search / staging: transient timeouts on elasticsearch queries.

first, clean the unused resources, even if it will not free a lot of resources:

  • aliases cleanup
vsellier@search-esnode0 ~ % export ES_SERVER=192.168.130.80:9200
vsellier@search-esnode0 ~ % curl -XGET http://$ES_SERVER/_cat/aliases
origin-read         origin-v0.11  - - - -
origin-write        origin-v0.11  - - - -
origin-v0.9.0-read  origin-v0.9.0 - - - -
origin-v0.9.0-write origin-v0.9.0 - - - -
vsellier@search-esnode0 ~ % curl -XDELETE http://$ES_SERVER/origin-v0.9.0/_alias/origin-v0.9.0-read
{"acknowledged":true}%                                                                                                                                       vsellier@search-esnode0 ~ % curl -XDELETE -H "Content-Type: application/json" http://$ES_SERVER/origin-v0.9.0/_alias/origin-v0.9.0-write
{"acknowledged":true}%
Feb 21 2022, 3:20 PM · Archive search, System administration
vsellier changed the status of T3784: swh-search / staging: transient timeouts on elasticsearch queries from Open to Work in Progress.
Feb 21 2022, 11:59 AM · Archive search, System administration
vsellier moved T3911: Cross replicate the staging storage between db1 and storage1 from in-progress to done on the System administration board.
Feb 21 2022, 8:48 AM · System administration
vsellier closed T3911: Cross replicate the staging storage between db1 and storage1 as Resolved.

The replication of object storage is now running correctly:

-- Journal begins at Thu 2022-02-17 04:52:45 UTC, ends at Mon 2022-02-21 07:44:15 UTC. --
Feb 17 15:41:22 db1 systemd[1]: Starting ZFS dataset synchronization of...
Feb 17 15:41:23 db1 syncoid[283583]: INFO: Sending oldest full snapshot data/objects@syncoid_db1_2022-02-17:15:41:23 (~ 11811.3 GB) to new target filesystem:
Feb 19 13:41:09 db1 systemd[1]: syncoid-storage1-objects.service: Succeeded.
Feb 19 13:41:09 db1 systemd[1]: Finished ZFS dataset synchronization of.
Feb 19 13:41:09 db1 systemd[1]: syncoid-storage1-objects.service: Consumed 1d 10h 59min 6.865s CPU time.
Feb 19 13:41:09 db1 systemd[1]: Starting ZFS dataset synchronization of...
Feb 19 13:41:11 db1 syncoid[3716482]: Sending incremental data/objects@syncoid_db1_2022-02-17:15:41:23 ... syncoid_db1_2022-02-19:13:41:09 (~ 130.3 GB):
Feb 19 14:29:18 db1 systemd[1]: syncoid-storage1-objects.service: Succeeded.
Feb 19 14:29:18 db1 systemd[1]: Finished ZFS dataset synchronization of.
Feb 19 14:29:18 db1 systemd[1]: syncoid-storage1-objects.service: Consumed 25min 43.311s CPU time.
Feb 19 14:29:18 db1 systemd[1]: Starting ZFS dataset synchronization of...
Feb 19 14:29:25 db1 syncoid[1084137]: Sending incremental data/objects@syncoid_db1_2022-02-19:13:41:09 ... syncoid_db1_2022-02-19:14:29:18 (~ 5.3 GB):
Feb 19 14:31:12 db1 systemd[1]: syncoid-storage1-objects.service: Succeeded.
Feb 19 14:31:12 db1 systemd[1]: Finished ZFS dataset synchronization of.
Feb 19 14:31:12 db1 systemd[1]: syncoid-storage1-objects.service: Consumed 1min 7.439s CPU time.
Feb 19 14:35:03 db1 systemd[1]: Starting ZFS dataset synchronization of...
Feb 19 14:35:07 db1 syncoid[1174209]: Sending incremental data/objects@syncoid_db1_2022-02-19:14:29:18 ... syncoid_db1_2022-02-19:14:35:04 (~ 710.1 MB):
Feb 19 14:35:35 db1 systemd[1]: syncoid-storage1-objects.service: Succeeded.
Feb 19 14:35:35 db1 systemd[1]: Finished ZFS dataset synchronization of.
Feb 19 14:35:35 db1 systemd[1]: syncoid-storage1-objects.service: Consumed 10.015s CPU time.
Feb 19 14:40:48 db1 systemd[1]: Starting ZFS dataset synchronization of...
Feb 19 14:40:52 db1 syncoid[1223955]: Sending incremental data/objects@syncoid_db1_2022-02-19:14:35:04 ... syncoid_db1_2022-02-19:14:40:49 (~ 271.6 MB):
Feb 19 14:41:14 db1 systemd[1]: syncoid-storage1-objects.service: Succeeded.
Feb 19 14:41:14 db1 systemd[1]: Finished ZFS dataset synchronization of.
Feb 19 14:41:14 db1 systemd[1]: syncoid-storage1-objects.service: Consumed 5.701s CPU time.
Feb 19 14:46:32 db1 systemd[1]: Starting ZFS dataset synchronization of...
Feb 19 14:46:37 db1 syncoid[1267267]: Sending incremental data/objects@syncoid_db1_2022-02-19:14:40:49 ... syncoid_db1_2022-02-19:14:46:33 (~ 461.8 MB):
Feb 19 14:47:05 db1 systemd[1]: syncoid-storage1-objects.service: Succeeded.
Feb 19 14:47:05 db1 systemd[1]: Finished ZFS dataset synchronization of.
Feb 19 14:47:05 db1 systemd[1]: syncoid-storage1-objects.service: Consumed 8.945s CPU time.
Feb 19 14:52:18 db1 systemd[1]: Starting ZFS dataset synchronization of...
Feb 19 14:52:22 db1 syncoid[1312265]: Sending incremental data/objects@syncoid_db1_2022-02-19:14:46:33 ... syncoid_db1_2022-02-19:14:52:19 (~ 263.2 MB):
Feb 19 14:52:42 db1 systemd[1]: syncoid-storage1-objects.service: Succeeded.
Feb 19 14:52:42 db1 systemd[1]: Finished ZFS dataset synchronization of.
Feb 19 14:52:42 db1 systemd[1]: syncoid-storage1-objects.service: Consumed 6.021s CPU time.
Feb 19 14:58:04 db1 systemd[1]: Starting ZFS dataset synchronization of...
...
Feb 21 2022, 8:48 AM · System administration

Feb 18 2022

vsellier accepted D7203: azure: Drop storage02 vm and associated resources.
Feb 18 2022, 4:05 PM
vsellier closed D7201: azure: upgrade definitions for last terraform and azurerm versions.
Feb 18 2022, 3:40 PM
vsellier committed rSPREb46aa75dd621: azure: upgrade definitions for last terraform and azurerm versions (authored by vsellier).
azure: upgrade definitions for last terraform and azurerm versions
Feb 18 2022, 3:40 PM
vsellier added a revision to T3903: Clean up unused azure vms or services: D7201: azure: upgrade definitions for last terraform and azurerm versions.
Feb 18 2022, 3:36 PM · System administration
vsellier requested review of D7201: azure: upgrade definitions for last terraform and azurerm versions.
Feb 18 2022, 3:36 PM

Feb 17 2022

vsellier added a comment to T3784: swh-search / staging: transient timeouts on elasticsearch queries.

looks like the server is short in heap

[2022-02-17T15:26:30,847][INFO ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][5965188] overhead, spent [408ms] collecting in the last [1s]
[2022-02-17T15:27:08,154][INFO ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][5965225] overhead, spent [296ms] collecting in the last [1s]
[2022-02-17T15:29:31,383][WARN ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][young][5965368][3283] duration [1s], collections [1]/[1.1s], total [1s]/[5.8m], memory [8.2gb]->[5.4gb]/[16gb], all_pools {[young] [2.8gb]->[0b]/[0b]}{[old] [4.7gb]->[5.3gb]/[16gb]}{[survivor] [652mb]->[184mb]/[0b]}
[2022-02-17T15:29:31,384][WARN ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][5965368] overhead, spent [1s] collecting in the last [1.1s]
[2022-02-17T15:31:49,449][INFO ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][5965506] overhead, spent [260ms] collecting in the last [1s]
[2022-02-17T15:33:46,505][INFO ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][5965623] overhead, spent [256ms] collecting in the last [1s]
[2022-02-17T15:37:11,728][INFO ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][5965828] overhead, spent [372ms] collecting in the last [1s]
[2022-02-17T15:47:19,087][INFO ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][5966435] overhead, spent [289ms] collecting in the last [1s]
[2022-02-17T15:49:56,439][INFO ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][5966592] overhead, spent [315ms] collecting in the last [1.1s]
[2022-02-17T15:55:40,579][INFO ][o.e.m.j.JvmGcMonitorService] [search-esnode0] [gc][5966936] overhead, spent [274ms] collecting in the last [1s]
Feb 17 2022, 5:17 PM · Archive search, System administration
vsellier closed D7180: zfs sync: Add the staging objects dataset replication to db1.

closed by rSPSITE16b929369b1967718da97b71f5af5949721b9578

Feb 17 2022, 5:01 PM
vsellier closed D7179: zfs sync: configure the staging kafka replication to db1.

closed by rSPSITEe3d6d0dfc00d339529d68227954229c7e7b6b1aa

Feb 17 2022, 5:01 PM
vsellier added a comment to T3911: Cross replicate the staging storage between db1 and storage1.

Objects replication:

  • land D7180
  • run puppet on db1 and storage1
  • the sync automatically starts:
Feb 17 15:41:22 db1 systemd[1]: Starting ZFS dataset synchronization of...
Feb 17 15:41:23 db1 syncoid[283583]: INFO: Sending oldest full snapshot data/objects@syncoid_db1_2022-02-17:15:41:23 (~ 11811.3 GB) to new target filesystem:

It will take some time to complete.

Feb 17 2022, 4:59 PM · System administration
vsellier committed rSPSITE16b929369b19: zfs sync: Add the staging objects dataset replication to db1 (authored by vsellier).
zfs sync: Add the staging objects dataset replication to db1
Feb 17 2022, 4:34 PM
vsellier committed rSPPRIVC4ab1b5374a4d: add the syncoid::ssh_key::db1 key (authored by vsellier).
add the syncoid::ssh_key::db1 key
Feb 17 2022, 2:01 PM
vsellier committed rSPSITEe3d6d0dfc00d: zfs sync: configure the staging kafka replication to db1 (authored by vsellier).
zfs sync: configure the staging kafka replication to db1
Feb 17 2022, 1:56 PM
vsellier added a comment to T3911: Cross replicate the staging storage between db1 and storage1.

kafka data replication:

  • prepare the dataset (ensure there is no mouts this time)
root@db1:~# zfs create -o canmount=noauto -o mountpoint=none data/sync
root@db1:~# zfs create -o canmount=noauto -o mountpoint=none data/sync/storage1
root@db1:~# zfs list
NAME                         USED  AVAIL     REFER  MOUNTPOINT
data                         736G  25.7T       96K  /data
data/postgres-indexer-12      96K  25.7T       96K  /srv/softwareheritage/postgres/12/indexer
data/postgres-main-12        733G  25.7T      729G  /srv/softwareheritage/postgres/12/main
data/postgres-misc           112K  25.7T      112K  /srv/softwareheritage/postgres
data/postgres-secondary-12    96K  25.7T       96K  /srv/softwareheritage/postgres/12/secondary
data/sync                    192K  25.7T       96K  none
data/sync/storage1            96K  25.7T       96K  none
  • land D7179
  • run puppet en db1 and storage
  • initial synchronization started:
Feb 17 13:05:09 db1 syncoid[999999]: INFO: Sending oldest full snapshot data/kafka@syncoid_db1_2022-02-17:13:05:09 (~ 1686.6 GB) to new target filesystem:
Feb 17 2022, 1:53 PM · System administration
vsellier added a comment to T3942: borg issues on multiple nodes.

Yes, my bad, it's due to T3911.

Feb 17 2022, 1:15 PM · System administration

Feb 15 2022

vsellier added a revision to T3911: Cross replicate the staging storage between db1 and storage1: D7180: zfs sync: Add the staging objects dataset replication to db1.
Feb 15 2022, 4:28 PM · System administration
vsellier requested review of D7180: zfs sync: Add the staging objects dataset replication to db1.
Feb 15 2022, 4:28 PM
vsellier requested review of D7179: zfs sync: configure the staging kafka replication to db1.
Feb 15 2022, 4:18 PM
vsellier added a revision to T3911: Cross replicate the staging storage between db1 and storage1: D7179: zfs sync: configure the staging kafka replication to db1.
Feb 15 2022, 4:18 PM · System administration
vsellier added a comment to T3911: Cross replicate the staging storage between db1 and storage1.
  • The initial synchronization took 2h20
  • After a stabilization period, the synchronization is done every 5mn and takes ~1mn (the sizes are logged uncompressed and must be / by ~2.5 to have the real size)
Feb 15 2022, 3:08 PM · System administration
vsellier added a comment to T3911: Cross replicate the staging storage between db1 and storage1.

D7173 landed. It initially focuses on the db1 -> storage1 replication to avoid having several initial replication at the same time. the storage1 -> db1 replication will be configured after the initial db1 replication will be done.
The replication will be done in this way (initiated by storage1):
db1 dataset data/postgres-main-12 replicated on storage1 /data/sync/db1/postgresql-main-12

Feb 15 2022, 12:07 PM · System administration
vsellier committed rSPPRIVCd1b60989bc54: Add swh::deploy::loader_bzr::sentry_token key (authored by vsellier).
Add swh::deploy::loader_bzr::sentry_token key
Feb 15 2022, 11:38 AM
vsellier committed rSPPRIVC23c4e24ec016: Add syncoid::ssh_key::storage1 (authored by vsellier).
Add syncoid::ssh_key::storage1
Feb 15 2022, 11:38 AM
vsellier committed rSPSITE87171b61695c: sanoid: configure the db1 -> storage1 zfs replication (authored by vsellier).
sanoid: configure the db1 -> storage1 zfs replication
Feb 15 2022, 11:36 AM
vsellier closed D7173: sanoid: configure the db1 -> storage1 zfs replication.
Feb 15 2022, 11:35 AM
vsellier committed rSPSITEbe3c146e8812: sanoid: prepare the server to server zfs replication (authored by vsellier).
sanoid: prepare the server to server zfs replication
Feb 15 2022, 11:35 AM
vsellier updated the diff for D7173: sanoid: configure the db1 -> storage1 zfs replication.

fix the doc of the key name computation

Feb 15 2022, 11:30 AM
vsellier added inline comments to D7173: sanoid: configure the db1 -> storage1 zfs replication.
Feb 15 2022, 10:55 AM

Feb 14 2022

vsellier retitled D7173: sanoid: configure the db1 -> storage1 zfs replication from sanoid: configure the db1 -> storage1 zfs replication 2 commits: - first: sanoid: prepare the server to server zfs replication to sanoid: configure the db1 -> storage1 zfs replication.
Feb 14 2022, 7:57 PM
vsellier requested review of D7173: sanoid: configure the db1 -> storage1 zfs replication.
Feb 14 2022, 7:56 PM
vsellier added a revision to T3911: Cross replicate the staging storage between db1 and storage1: D7173: sanoid: configure the db1 -> storage1 zfs replication.
Feb 14 2022, 7:56 PM · System administration

Feb 11 2022

vsellier committed rSENVf87990fd3fc8: Update the debian version of the migrated vms (authored by vsellier).
Update the debian version of the migrated vms
Feb 11 2022, 6:54 PM
vsellier accepted D7158: keycloak: Remove realm direct grant flow override.
Feb 11 2022, 3:23 PM
vsellier created P1285 keycloak error.
Feb 11 2022, 2:51 PM

Feb 10 2022

vsellier changed the status of T3911: Cross replicate the staging storage between db1 and storage1 from Open to Work in Progress.
Feb 10 2022, 2:35 PM · System administration
vsellier accepted D7141: provenance: Give some permissions to provenance team.
Feb 10 2022, 2:23 PM
vsellier committed rSENV117a1686fe53: Add new servers facts (authored by vsellier).
Add new servers facts
Feb 10 2022, 11:29 AM
vsellier closed D7136: icinga: don't try to monitor directories under the postgresql datadir.
Feb 10 2022, 11:19 AM
vsellier committed rSPSITE27e269717fbd: icinga: don't try to monitor directories under the postgresql datadir (authored by vsellier).
icinga: don't try to monitor directories under the postgresql datadir
Feb 10 2022, 11:19 AM

Feb 9 2022

vsellier requested review of D7136: icinga: don't try to monitor directories under the postgresql datadir.
Feb 9 2022, 4:45 PM
vsellier added a revision to T3889: Admin database backup: D7136: icinga: don't try to monitor directories under the postgresql datadir.
Feb 9 2022, 4:45 PM · System administration

Feb 8 2022

vsellier added a comment to T3889: Admin database backup.

the first local snapshots worked:

root@dali:~# zfs list -t all
NAME                                                       USED  AVAIL     REFER  MOUNTPOINT
data                                                      66.7G   126G       24K  /data
data/postgresql                                           66.6G   126G     66.6G  /srv/postgresql/14/main
data/postgresql@autosnap_2022-02-08_19:04:44_monthly      1.47M      -     66.6G  -
data/postgresql@autosnap_2022-02-08_19:04:44_daily         194K      -     66.6G  -
data/postgresql/wal                                       31.8M   126G     14.9M  /srv/postgresql/14/main/pg_wal
data/postgresql/wal@autosnap_2022-02-08_19:04:44_monthly  16.3M      -     31.3M  -
data/postgresql/wal@autosnap_2022-02-08_19:04:44_daily      13K      -     15.0M  -
Feb 8 2022, 8:11 PM · System administration
vsellier closed D7118: backups: implements a zfs snapshot backup.
Feb 8 2022, 8:01 PM
vsellier committed rSPSITE9300ba9a5783: backups: implements a postgresql backup based on zfs snapshots (authored by vsellier).
backups: implements a postgresql backup based on zfs snapshots
Feb 8 2022, 8:01 PM
vsellier updated the diff for D7118: backups: implements a zfs snapshot backup.

rebase

Feb 8 2022, 8:01 PM
vsellier added a comment to T3889: Admin database backup.

The dali database directory tree was prepared to have a dedicated mount dataset for the wals:

root@dali:~# date
Tue Feb  8 18:48:57 UTC 2022
root@dali:~# systemctl stop postgresql@14-main
● postgresql@14-main.service - PostgreSQL Cluster 14-main
     Loaded: loaded (/lib/systemd/system/postgresql@.service; enabled-runtime; vendor preset: enabled)
     Active: inactive (dead) since Tue 2022-02-08 18:48:58 UTC; 5ms ago
    Process: 2705743 ExecStop=/usr/bin/pg_ctlcluster --skip-systemctl-redirect -m fast 14-main stop (code=exited, status=0/SUCCESS)
   Main PID: 31293 (code=exited, status=0/SUCCESS)
        CPU: 1d 6h 12min 2.894s
Feb 8 2022, 7:55 PM · System administration
vsellier updated the test plan for D7118: backups: implements a zfs snapshot backup.
Feb 8 2022, 4:30 PM
vsellier updated the diff for D7118: backups: implements a zfs snapshot backup.

use a template instead of stdlib::to_toml function not compatible with puppet 5

Feb 8 2022, 4:29 PM
vsellier accepted D7123: Configure vault cookers to send their issue to sentry.
Feb 8 2022, 4:22 PM
vsellier added a comment to D7118: backups: implements a zfs snapshot backup.

thanks, I will fix that.

Feb 8 2022, 3:02 PM
vsellier committed rSENV331e3b73f650: vagrant: declare saam node (authored by vsellier).
vagrant: declare saam node
Feb 8 2022, 2:20 PM
vsellier accepted D7112: Deploy swh-worker@loader_bzr service to staging workers.
Feb 8 2022, 2:16 PM
vsellier updated the diff for D7118: backups: implements a zfs snapshot backup.

update commit message

Feb 8 2022, 2:12 PM
vsellier updated the summary of D7118: backups: implements a zfs snapshot backup.
Feb 8 2022, 2:11 PM
vsellier updated the summary of D7118: backups: implements a zfs snapshot backup.
Feb 8 2022, 2:11 PM
vsellier retitled D7118: backups: implements a zfs snapshot backup from WIP backups: implements a zfs snapshot backup to backups: implements a zfs snapshot backup.
Feb 8 2022, 2:11 PM
vsellier updated the diff for D7118: backups: implements a zfs snapshot backup.
  • add the postgresql backup management script
  • ensure the snapshot of the wal is done after the postgresql snapshot
Feb 8 2022, 2:04 PM
vsellier updated the diff for D7118: backups: implements a zfs snapshot backup.

Update to only keep the local snapshot section.
The sync deployment will be implemented in another diff.

Feb 8 2022, 11:20 AM
vsellier planned changes to D7118: backups: implements a zfs snapshot backup.
Feb 8 2022, 11:14 AM
vsellier requested review of D7118: backups: implements a zfs snapshot backup.
Feb 8 2022, 11:14 AM
vsellier added a revision to T3889: Admin database backup: D7118: backups: implements a zfs snapshot backup.
Feb 8 2022, 11:14 AM · System administration

Feb 7 2022

vsellier closed D7110: sysadm: add a postgresql backup management section.
Feb 7 2022, 5:17 PM
vsellier committed rDDOCee054f4c41db: sysadm: add a postgresql backup management section (authored by vsellier).
sysadm: add a postgresql backup management section
Feb 7 2022, 5:17 PM
vsellier renamed T3911: Cross replicate the staging storage between db1 and storage1 from Replicate the staging storage between db1 and storage1 to Cross replicate the staging storage between db1 and storage1.
Feb 7 2022, 10:29 AM · System administration
vsellier triaged T3911: Cross replicate the staging storage between db1 and storage1 as Normal priority.
Feb 7 2022, 10:29 AM · System administration
vsellier closed T2733: Explore / install a varnish prometheus probe as Resolved.

the exporter is deployed.
The varnish stats are available on this dashboard: https://grafana.softwareheritage.org/d/pE2xMZank/varnish

Feb 7 2022, 9:00 AM · Metrics/monitoring, System administration