Closing this task as all the possible upgrade are done.
The delayed upgrades will be followed in dedicated task as it will be integradated in a more global task relative to the elastic infrastructure or the pergamon splitting task

Dec 21 2021, 4:06 PM · System administration (Component upgrades)

vsellier updated the task description for T3579: Meta-task: upgrade infrastructure to Debian Bullseye.

Dec 21 2021, 4:03 PM · System administration (Component upgrades)

vsellier closed T3807: Migrate kelvingrove (keycloak) to bullseye as Resolved.

root@kelvingrove:~# task=T3807
root@kelvingrove:/etc# puppet agent --disable "$task: dist-upgrade to bullseye"
root@kelvingrove:/etc# sed -i -e 's/buster/bullseye/;s,bullseye/updates,bullseye-security,' /etc/apt/sources.list.d/*

Dec 21 2021, 4:03 PM · System administration (Component upgrades)

vsellier closed T3807: Migrate kelvingrove (keycloak) to bullseye, a subtask of T3579: Meta-task: upgrade infrastructure to Debian Bullseye, as Resolved.

Dec 21 2021, 4:03 PM · System administration (Component upgrades)

vsellier added a comment to T3807: Migrate kelvingrove (keycloak) to bullseye.

The tests with vagrant are not showing any issue with puppet / keycloak after the upgrade so let's proceed to the upgrade.

Dec 21 2021, 3:38 PM · System administration (Component upgrades)

vsellier closed D6866: Force to use a previous version of frozendict to avoid segfault.

Dec 21 2021, 3:00 PM

vsellier committed rDCIDX9777195fdb1a: Force to use a previous version of frozendict to avoid segfault (authored by vsellier).

Force to use a previous version of frozendict to avoid segfault

Dec 21 2021, 3:00 PM

vsellier changed the status of T3807: Migrate kelvingrove (keycloak) to bullseye, a subtask of T3579: Meta-task: upgrade infrastructure to Debian Bullseye, from Open to Work in Progress.

Dec 21 2021, 2:13 PM · System administration (Component upgrades)

vsellier changed the status of T3807: Migrate kelvingrove (keycloak) to bullseye from Open to Work in Progress.

Dec 21 2021, 2:13 PM · System administration (Component upgrades)

vsellier updated the task description for T3579: Meta-task: upgrade infrastructure to Debian Bullseye.

Dec 21 2021, 12:13 PM · System administration (Component upgrades)

vsellier updated the task description for T3579: Meta-task: upgrade infrastructure to Debian Bullseye.

Dec 21 2021, 12:13 PM · System administration (Component upgrades)

vsellier closed T3801: Migrate production database servers to bullseye as Resolved.

Dec 21 2021, 12:12 PM · System administration (Component upgrades)

vsellier closed T3801: Migrate production database servers to bullseye, a subtask of T3579: Meta-task: upgrade infrastructure to Debian Bullseye, as Resolved.

Dec 21 2021, 12:12 PM · System administration (Component upgrades)

vsellier updated the task description for T3801: Migrate production database servers to bullseye.

Dec 21 2021, 12:09 PM · System administration (Component upgrades)

vsellier updated the task description for T3801: Migrate production database servers to bullseye.

Dec 21 2021, 12:09 PM · System administration (Component upgrades)

vsellier updated the task description for T3801: Migrate production database servers to bullseye.

Dec 21 2021, 11:49 AM · System administration (Component upgrades)

vsellier added a comment to D6866: Force to use a previous version of frozendict to avoid segfault.

In D6866#178393, @olasd wrote:

This deserves an upstream bug on frozendict 2.1.2, if you've managed to track it down...

Dec 21 2021, 11:08 AM

vsellier updated the diff for D6866: Force to use a previous version of frozendict to avoid segfault.

fix a typo

Dec 21 2021, 10:48 AM

vsellier updated the task description for T3801: Migrate production database servers to bullseye.

Dec 21 2021, 10:41 AM · System administration (Component upgrades)

vsellier added a comment to T3801: Migrate production database servers to bullseye.

A memory alert is logged on the idrac

	Correctable memory error logging disabled for a memory device at location DIMM_A9. 	Fri 17 Dec 2021 16:15:39

We will have to monitor in the future to check if this memory dimm has some weaknesses

Dec 21 2021, 10:13 AM · System administration (Component upgrades)

vsellier updated the task description for T3801: Migrate production database servers to bullseye.

Dec 21 2021, 9:59 AM · System administration (Component upgrades)

vsellier updated the task description for T3801: Migrate production database servers to bullseye.

Dec 21 2021, 9:34 AM · System administration (Component upgrades)

vsellier added a comment to T3801: Migrate production database servers to bullseye.

on moma:

puppet disabled

root@moma:/etc/softwareheritage/storage# puppet agent --disable 'T3801 upgrade database servers'

storage configuration update to use belvedere database and service restarted

Dec 21 2021, 9:24 AM · System administration (Component upgrades)

vsellier updated the task description for T3801: Migrate production database servers to bullseye.

Dec 21 2021, 9:16 AM · System administration (Component upgrades)

vsellier updated the task description for T3801: Migrate production database servers to bullseye.

Dec 21 2021, 9:06 AM · System administration (Component upgrades)

vsellier renamed T3801: Migrate production database servers to bullseye from Migrate database servers to bullseye to Migrate production database servers to bullseye.

Dec 21 2021, 9:04 AM · System administration (Component upgrades)

vsellier changed the status of T3801: Migrate production database servers to bullseye, a subtask of T3579: Meta-task: upgrade infrastructure to Debian Bullseye, from Open to Work in Progress.

Dec 21 2021, 9:04 AM · System administration (Component upgrades)

vsellier changed the status of T3801: Migrate production database servers to bullseye from Open to Work in Progress.

Dec 21 2021, 9:04 AM · System administration (Component upgrades)

Dec 20 2021

vsellier requested review of D6866: Force to use a previous version of frozendict to avoid segfault.

Dec 20 2021, 6:38 PM

vsellier added a revision to T3815: Diagnose swh-environment build failures: D6866: Force to use a previous version of frozendict to avoid segfault.

Dec 20 2021, 6:35 PM

vsellier added a comment to T3815: Diagnose swh-environment build failures.

It seems the problem is related to the new version 2.1.2 of the frozendict library released the 18h December.
Pinning the version to the previous 2.1.1 solved the problem

Dec 20 2021, 6:28 PM

vsellier added a comment to T3815: Diagnose swh-environment build failures.

~~For the segfault, I suspect an issue due to the OS difference inside the docker container and the host (debian 10 / debian 11)~~

root@e35f7a024575:/home/jenkins/swh-environment/swh-indexer# gdb python3 core
(gdb) where
#0  raise (sig=11) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  <signal handler called>
#2  0x00007f6548d70d46 in frozendict_new_barebone (type=0x7f6548d800e0 <PyFrozenDict_Type>)
    at /project/frozendict/src/3_7/frozendictobject.c:2214
#3  _frozendict_new (use_empty_frozendict=1, kwds=0x0, args=<optimized out>, type=0x7f6548d800e0 <PyFrozenDict_Type>)
    at /project/frozendict/src/3_7/frozendictobject.c:2255
#4  frozendict_new (type=0x7f6548d800e0 <PyFrozenDict_Type>, args=<optimized out>, kwds=0x0)
    at /project/frozendict/src/3_7/frozendictobject.c:2290
#5  0x00000000005d9bd7 in _PyObject_FastCallKeywords ()
#136 0x000000000065468e in _Py_UnixMain ()
#137 0x00007f654efe109b in __libc_start_main (main=0x4bc560 <main>, argc=9, argv=0x7ffe6f651488, init=<optimized out>, 
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe6f651478) at ../csu/libc-start.c:308
#138 0x00000000005e0e8a in _start ()
(gdb)

I'm trying to reproduce the problem locally in a vm to check if a workaround can be foud.

Dec 20 2021, 5:10 PM

vsellier changed the status of T3815: Diagnose swh-environment build failures from Open to Work in Progress.

Dec 20 2021, 3:23 PM

Dec 17 2021

vsellier closed T2655: Configure cloud-init to avoid fighting with puppet changes as Resolved.

Dec 17 2021, 4:54 PM · System administration

vsellier committed rSPRE58fd7dcbb4f6: terraform: add the cloud-init fact on the server initialization (authored by vsellier).

terraform: add the cloud-init fact on the server initialization

Dec 17 2021, 4:54 PM

vsellier added a comment to T2655: Configure cloud-init to avoid fighting with puppet changes.

fact installed on the staging nodes:

root@pergamon:/etc/clustershell# clush -b -w @staging 'if [ -e /etc/systemd/system/cloud-init.target.wants/cloud-init.service ]; then echo "cloud-init installed"; echo cloudinit_enabled=true > /etc/facter/facts.d/cloud-init.txt; else echo "cloud-init not installed"; fi'
---------------
counters0.internal.staging.swh.network,deposit.internal.staging.swh.network,objstorage0.internal.staging.swh.network,poc-rancher-sw[0-1].internal.staging.swh.network,poc-rancher.internal.staging.swh.network,rp0.internal.staging.swh.network,scheduler0.internal.staging.swh.network,search0.internal.staging.swh.network,vault.internal.staging.swh.network,webapp.internal.staging.swh.network,worker[0-3].internal.staging.swh.network (15)
---------------
cloud-init installed
---------------
db1.internal.staging.swh.network,storage1.internal.staging.swh.network (2)
---------------
cloud-init not installed

Dec 17 2021, 4:44 PM · System administration

vsellier closed D6861: cloud-init: disable the /etc/hosts upgrade on boot.

Dec 17 2021, 4:28 PM

vsellier committed rSPSITE3991fc44500b: cloud-init: disable the /etc/hosts upgrade on boot (authored by vsellier).

cloud-init: disable the /etc/hosts upgrade on boot

Dec 17 2021, 4:28 PM

vsellier updated the diff for D6861: cloud-init: disable the /etc/hosts upgrade on boot.

rebase

Dec 17 2021, 4:27 PM

vsellier closed T3778: The docker-dev build is often failing as Resolved.

During the week, only one request took more than 1s.
As it looks rare, it seems it's relative to the load on the server during the build, so I'm not sure it worst the case to investigate further.

Dec 17 2021, 4:25 PM · System administration

vsellier closed T3813: Migrate the staging database server to bullseye as Resolved.

workers:

Before the migration

root@pergamon:~# clush -b -w @staging-workers 'set -e; puppet agent --disable "T3812"; puppet agent --disable T3771; cd /etc/systemd/system/multi-user.target.wants; for unit in swh-worker@*; do systemctl disable $unit; done; systemctl stop --no-block swh-worker@*; sleep 300; systemctl kill swh-worker@* -s 9'

Dec 17 2021, 4:14 PM · System administration (Component upgrades)

vsellier closed T3813: Migrate the staging database server to bullseye, a subtask of T3801: Migrate production database servers to bullseye, as Resolved.

Dec 17 2021, 4:14 PM · System administration (Component upgrades)

vsellier updated the task description for T3813: Migrate the staging database server to bullseye.

Dec 17 2021, 2:25 PM · System administration (Component upgrades)

vsellier retitled D6861: cloud-init: disable the /etc/hosts upgrade on boot from cloud-init: disable the /etc/host upgrade on boot to cloud-init: disable the /etc/hosts upgrade on boot.

Dec 17 2021, 2:22 PM

vsellier updated the diff for D6861: cloud-init: disable the /etc/hosts upgrade on boot.

fix a typo on the commit message

Dec 17 2021, 2:22 PM

vsellier updated the task description for T3801: Migrate production database servers to bullseye.

Dec 17 2021, 2:17 PM · System administration (Component upgrades)

vsellier updated the task description for T3801: Migrate production database servers to bullseye.

Dec 17 2021, 2:17 PM · System administration (Component upgrades)

vsellier changed the status of T3813: Migrate the staging database server to bullseye from Open to Work in Progress.

Dec 17 2021, 2:16 PM · System administration (Component upgrades)

vsellier added a comment to T2655: Configure cloud-init to avoid fighting with puppet changes.

Testing with this config file:

#cloud-config-jsonp
[{ "op": "replace", "path": "/manage_etc_hosts", "value": "False"}]

gives this error:

2021-12-16 22:35:11,471 - __init__.py[DEBUG]: Calling handler CloudConfigPartHandler: [['text/cloud-config', 'text/cloud-config-jsonp']] (text/cloud-config-jsonp, part-001, 3) with frequency always
2021-12-16 22:35:11,472 - cloud_config.py[DEBUG]: Merging by applying json patch [{"op": "replace", "path": "/manage_etc_hosts", "value": "False"}]
2021-12-16 22:35:11,472 - util.py[WARNING]: Failed at merging in cloud config part from part-001
2021-12-16 22:35:11,474 - util.py[DEBUG]: Failed at merging in cloud config part from part-001
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/handlers/cloud_config.py", line 138, in handle_part
    self._merge_patch(payload)
  File "/usr/lib/python3/dist-packages/cloudinit/handlers/cloud_config.py", line 113, in _merge_patch
    self.cloud_buf = patch.apply(self.cloud_buf, in_place=False)
  File "/usr/lib/python3/dist-packages/jsonpatch.py", line 312, in apply
    obj = operation.apply(obj)
  File "/usr/lib/python3/dist-packages/jsonpatch.py", line 483, in apply
    raise JsonPatchConflict(msg)
jsonpatch.JsonPatchConflict: can't replace non-existent object 'manage_etc_hosts'
2021-12-16 22:35:11,475 - __init__.py[DEBUG]: Calling handler CloudConfigPartHandler: [['text/cloud-config', 'text/cloud-config-jsonp']] (__end__, None, 3) with frequency always

Dec 17 2021, 2:02 PM · System administration

vsellier added a revision to T2655: Configure cloud-init to avoid fighting with puppet changes: D6861: cloud-init: disable the /etc/hosts upgrade on boot.

Dec 17 2021, 9:29 AM · System administration

vsellier requested review of D6861: cloud-init: disable the /etc/hosts upgrade on boot.

Dec 17 2021, 9:29 AM

Dec 16 2021

vsellier changed the status of T2655: Configure cloud-init to avoid fighting with puppet changes from Open to Work in Progress.

Dec 16 2021, 10:57 PM · System administration

vsellier added a comment to T2655: Configure cloud-init to avoid fighting with puppet changes.

it seems cloud init does not support overriding a property defined in the user-data configuration:

Dec 16 2021, 10:56 PM · System administration

vsellier committed rSPSITE6513958f823c: logstash: fix the regexp to extract the name of the index to reopen from the… (authored by vsellier).

logstash: fix the regexp to extract the name of the index to reopen from the…

Dec 16 2021, 2:37 PM

vsellier accepted D6852: grafana: proxy /api/live/ through using mod_proxy_websocket.

thanks

Dec 16 2021, 12:00 PM

vsellier accepted D6851: Deploy and activate swh-worker@loader_cvs on staging workers.

Dec 16 2021, 11:37 AM

vsellier committed rSPREbc6effcf7b4f: staging: increase objstorage0 memory to avoid OOM on big contents (authored by vsellier).

staging: increase objstorage0 memory to avoid OOM on big contents

Dec 16 2021, 10:32 AM

vsellier accepted D6833: Clean up psql version to default to globally defined version.

LGTM thanks

Dec 16 2021, 9:54 AM

Dec 15 2021

vsellier moved T3778: The docker-dev build is often failing from in-progress to deployed/landed/monitoring on the System administration board.

Dec 15 2021, 5:09 PM · System administration

vsellier closed T3795: update terraform recipe to configure the vm's cpu type to default and support onboot option as Resolved.

Dec 15 2021, 5:08 PM · System administration

vsellier committed rSPREc2c80579a436: staging: add missing args for 2 of the staging workers (authored by vsellier).

staging: add missing args for 2 of the staging workers

Dec 15 2021, 5:08 PM

vsellier closed D6845: terraform: specify the cpu to kvm64 by default.

Dec 15 2021, 5:08 PM

vsellier committed rSPRE3c005a3610bf: terraform: specify the cpu to kvm64 by default (authored by vsellier).

terraform: specify the cpu to kvm64 by default

Dec 15 2021, 5:08 PM

vsellier accepted D6844: Add documentation for bootstrapping the Debian branches of a SWH package.

Thanks, it will be very useful ;)

Dec 15 2021, 5:06 PM

vsellier updated the test plan for D6845: terraform: specify the cpu to kvm64 by default.

Dec 15 2021, 4:49 PM

vsellier retitled D6845: terraform: specify the cpu to kvm64 by default from terraform: specigy the cpu to kvm64 by default to terraform: specify the cpu to kvm64 by default.

Dec 15 2021, 4:49 PM

vsellier updated the diff for D6845: terraform: specify the cpu to kvm64 by default.

fix a type on the commit message

Dec 15 2021, 4:49 PM

vsellier added a revision to T3795: update terraform recipe to configure the vm's cpu type to default and support onboot option: D6845: terraform: specify the cpu to kvm64 by default.

Dec 15 2021, 4:44 PM · System administration

vsellier requested review of D6845: terraform: specify the cpu to kvm64 by default.

Dec 15 2021, 4:44 PM

vsellier committed rSPRE6008e41c85f0: staging: configure onboot=false for workers and POCs (authored by vsellier).

staging: configure onboot=false for workers and POCs

Dec 15 2021, 4:07 PM

vsellier renamed T3795: update terraform recipe to configure the vm's cpu type to default and support onboot option from update terraform recipe to configure the vm's cpu type to default to update terraform recipe to configure the vm's cpu type to default and support onboot option.

Dec 15 2021, 4:06 PM · System administration

vsellier closed T3806: terraform: upgrade proxmox provider to last release as Resolved.

proxmox provider updated to v2.9.3

Dec 15 2021, 3:20 PM · System administration

vsellier closed D6840: terraform: update the proxmox provider.

Dec 15 2021, 3:20 PM

vsellier committed rSPRE5cf38552a3b5: terraform: update the proxmox provider (authored by vsellier).

terraform: update the proxmox provider

Dec 15 2021, 3:20 PM

vsellier updated the diff for D6840: terraform: update the proxmox provider.

Remove last references to storage_type

Dec 15 2021, 3:19 PM

vsellier added inline comments to D6840: terraform: update the proxmox provider.

Dec 15 2021, 3:10 PM

vsellier requested review of D6840: terraform: update the proxmox provider.

Dec 15 2021, 12:55 PM

vsellier added a revision to T3806: terraform: upgrade proxmox provider to last release: D6840: terraform: update the proxmox provider.

Dec 15 2021, 12:55 PM · System administration

vsellier added a comment to T3806: terraform: upgrade proxmox provider to last release.

After some adapations, the syntax is now good.

Dec 15 2021, 11:25 AM · System administration

Dec 14 2021

vsellier updated the task description for T3806: terraform: upgrade proxmox provider to last release.

Dec 14 2021, 6:32 PM · System administration

vsellier added a comment to T3795: update terraform recipe to configure the vm's cpu type to default and support onboot option.

there are some issues to correctly upgrade the cpu type status.
The real value of the field on proxmox is not correctly detected and terraform is always trying to upgrade the cpu type.

Dec 14 2021, 6:30 PM · System administration

vsellier changed the status of T3795: update terraform recipe to configure the vm's cpu type to default and support onboot option from Open to Work in Progress.

Dec 14 2021, 6:28 PM · System administration

vsellier changed the status of T3806: terraform: upgrade proxmox provider to last release from Open to Work in Progress.

Dec 14 2021, 6:28 PM · System administration

vsellier accepted D6827: sysadm: Add how to access the firewall nodes without vpn tutorial.

thanks

Dec 14 2021, 6:19 PM

vsellier closed T3805: Decommission boatbucket server as Resolved.

node decommissioned from puppet:

root@pergamon:~# /usr/local/sbin/swh-puppet-master-decommission boatbucket.internal.softwareheritage.org
+ puppet node deactivate boatbucket.internal.softwareheritage.org
Submitted 'deactivate node' for boatbucket.internal.softwareheritage.org with UUID bf7bd0ea-f1ae-442f-b840-5bb1adb261f3
+ puppet node clean boatbucket.internal.softwareheritage.org
Notice: Revoked certificate with serial 224
Notice: Removing file Puppet::SSL::Certificate boatbucket.internal.softwareheritage.org at '/var/lib/puppet/ssl/ca/signed/boatbucket.internal.softwareheritage.org.pem'
boatbucket.internal.softwareheritage.org
+ puppet cert clean boatbucket.internal.softwareheritage.org
Warning: `puppet cert` is deprecated and will be removed in a future release.
   (location: /usr/lib/ruby/vendor_ruby/puppet/application.rb:370:in `run')
Notice: Revoked certificate with serial 224
+ systemctl restart apache2
root@pergamon:~# puppet agent --test

server manually removed from proxmox / uffizi

Dec 14 2021, 4:47 PM · System administration

vsellier added a comment to T3805: Decommission boatbucket server.

home directories backuped:

root@boatbucket:/home# ls -d alphare/* alphare/.bash_history boatbucket/* boatbucket/.bash_history | xargs tar cvjf boatbucket-backup-2021-12-14.tar.bz2
...

and saved on saam

root@boatbucket:/home# sudo -u boatbucket cp -v boatbucket-backup-2021-12-14.tar.bz /srv/boatbucket/
'boatbucket-backup-2021-12-14.tar.bz' -> '/srv/boatbucket/boatbucket-backup-2021-12-14.tar.bz'
root@boatbucket:/home# ls -al /srv/boatbucket/boatbucket-backup-2021-12-14.tar.bz 
-rw-r--r-- 1 boatbucket boatbucket 124170240 Dec 14 15:36 /srv/boatbucket/boatbucket-backup-2021-12-14.tar.bz
root@boatbucket:/home# mount | grep boatbucket
systemd-1 on /srv/boatbucket type autofs (rw,relatime,fd=54,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=13290)
saam:/srv/storage/space/mirrors/boatbucket on /srv/boatbucket type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.100.107,local_lock=none,addr=192.168.100.109)

Dec 14 2021, 4:38 PM · System administration

vsellier updated the task description for T3579: Meta-task: upgrade infrastructure to Debian Bullseye.

Dec 14 2021, 4:18 PM · System administration (Component upgrades)

vsellier changed the status of T3805: Decommission boatbucket server from Open to Work in Progress.

Dec 14 2021, 4:17 PM · System administration

vsellier updated the task description for T3801: Migrate production database servers to bullseye.

Dec 14 2021, 4:03 PM · System administration (Component upgrades)

vsellier updated the task description for T3801: Migrate production database servers to bullseye.

Dec 14 2021, 3:56 PM · System administration (Component upgrades)

vsellier accepted D6832: bojimans: Fix postgresql version.

LGTM
not blocker, do we need the swh::postgresql::version indirection ?

Dec 14 2021, 2:26 PM