Closing due to inactivity. Feel free to reopen if needed.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Dec 22 2021
Closing, I guess we can live with this ;)
Feel free to reopen if you disagree
Package upgraded in T3705.
Dec 21 2021
Closing this task as all the possible upgrade are done.
The delayed upgrades will be followed in dedicated task as it will be integradated in a more global task relative to the elastic infrastructure or the pergamon splitting task
root@kelvingrove:~# task=T3807 root@kelvingrove:/etc# puppet agent --disable "$task: dist-upgrade to bullseye" root@kelvingrove:/etc# sed -i -e 's/buster/bullseye/;s,bullseye/updates,bullseye-security,' /etc/apt/sources.list.d/*
The tests with vagrant are not showing any issue with puppet / keycloak after the upgrade so let's proceed to the upgrade.
In D6866#178393, @olasd wrote:This deserves an upstream bug on frozendict 2.1.2, if you've managed to track it down...
fix a typo
A memory alert is logged on the idrac
Correctable memory error logging disabled for a memory device at location DIMM_A9. Fri 17 Dec 2021 16:15:39
We will have to monitor in the future to check if this memory dimm has some weaknesses
on moma:
- puppet disabled
root@moma:/etc/softwareheritage/storage# puppet agent --disable 'T3801 upgrade database servers'
- storage configuration update to use belvedere database and service restarted
Dec 20 2021
It seems the problem is related to the new version 2.1.2 of the frozendict library released the 18h December.
Pinning the version to the previous 2.1.1 solved the problem
For the segfault, I suspect an issue due to the OS difference inside the docker container and the host (debian 10 / debian 11)
root@e35f7a024575:/home/jenkins/swh-environment/swh-indexer# gdb python3 core (gdb) where #0 raise (sig=11) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 <signal handler called> #2 0x00007f6548d70d46 in frozendict_new_barebone (type=0x7f6548d800e0 <PyFrozenDict_Type>) at /project/frozendict/src/3_7/frozendictobject.c:2214 #3 _frozendict_new (use_empty_frozendict=1, kwds=0x0, args=<optimized out>, type=0x7f6548d800e0 <PyFrozenDict_Type>) at /project/frozendict/src/3_7/frozendictobject.c:2255 #4 frozendict_new (type=0x7f6548d800e0 <PyFrozenDict_Type>, args=<optimized out>, kwds=0x0) at /project/frozendict/src/3_7/frozendictobject.c:2290 #5 0x00000000005d9bd7 in _PyObject_FastCallKeywords () #136 0x000000000065468e in _Py_UnixMain () #137 0x00007f654efe109b in __libc_start_main (main=0x4bc560 <main>, argc=9, argv=0x7ffe6f651488, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe6f651478) at ../csu/libc-start.c:308 #138 0x00000000005e0e8a in _start () (gdb)
I'm trying to reproduce the problem locally in a vm to check if a workaround can be foud.
Dec 17 2021
fact installed on the staging nodes:
root@pergamon:/etc/clustershell# clush -b -w @staging 'if [ -e /etc/systemd/system/cloud-init.target.wants/cloud-init.service ]; then echo "cloud-init installed"; echo cloudinit_enabled=true > /etc/facter/facts.d/cloud-init.txt; else echo "cloud-init not installed"; fi' --------------- counters0.internal.staging.swh.network,deposit.internal.staging.swh.network,objstorage0.internal.staging.swh.network,poc-rancher-sw[0-1].internal.staging.swh.network,poc-rancher.internal.staging.swh.network,rp0.internal.staging.swh.network,scheduler0.internal.staging.swh.network,search0.internal.staging.swh.network,vault.internal.staging.swh.network,webapp.internal.staging.swh.network,worker[0-3].internal.staging.swh.network (15) --------------- cloud-init installed --------------- db1.internal.staging.swh.network,storage1.internal.staging.swh.network (2) --------------- cloud-init not installed
rebase
During the week, only one request took more than 1s.
As it looks rare, it seems it's relative to the load on the server during the build, so I'm not sure it worst the case to investigate further.
workers:
Before the migration
root@pergamon:~# clush -b -w @staging-workers 'set -e; puppet agent --disable "T3812"; puppet agent --disable T3771; cd /etc/systemd/system/multi-user.target.wants; for unit in swh-worker@*; do systemctl disable $unit; done; systemctl stop --no-block swh-worker@*; sleep 300; systemctl kill swh-worker@* -s 9'
fix a typo on the commit message
Testing with this config file:
#cloud-config-jsonp [{ "op": "replace", "path": "/manage_etc_hosts", "value": "False"}]
gives this error:
2021-12-16 22:35:11,471 - __init__.py[DEBUG]: Calling handler CloudConfigPartHandler: [['text/cloud-config', 'text/cloud-config-jsonp']] (text/cloud-config-jsonp, part-001, 3) with frequency always 2021-12-16 22:35:11,472 - cloud_config.py[DEBUG]: Merging by applying json patch [{"op": "replace", "path": "/manage_etc_hosts", "value": "False"}] 2021-12-16 22:35:11,472 - util.py[WARNING]: Failed at merging in cloud config part from part-001 2021-12-16 22:35:11,474 - util.py[DEBUG]: Failed at merging in cloud config part from part-001 Traceback (most recent call last): File "/usr/lib/python3/dist-packages/cloudinit/handlers/cloud_config.py", line 138, in handle_part self._merge_patch(payload) File "/usr/lib/python3/dist-packages/cloudinit/handlers/cloud_config.py", line 113, in _merge_patch self.cloud_buf = patch.apply(self.cloud_buf, in_place=False) File "/usr/lib/python3/dist-packages/jsonpatch.py", line 312, in apply obj = operation.apply(obj) File "/usr/lib/python3/dist-packages/jsonpatch.py", line 483, in apply raise JsonPatchConflict(msg) jsonpatch.JsonPatchConflict: can't replace non-existent object 'manage_etc_hosts' 2021-12-16 22:35:11,475 - __init__.py[DEBUG]: Calling handler CloudConfigPartHandler: [['text/cloud-config', 'text/cloud-config-jsonp']] (__end__, None, 3) with frequency always
Dec 16 2021
it seems cloud init does not support overriding a property defined in the user-data configuration:
thanks
LGTM thanks
Dec 15 2021
Thanks, it will be very useful ;)
fix a type on the commit message
proxmox provider updated to v2.9.3
Remove last references to storage_type
After some adapations, the syntax is now good.
Dec 14 2021
there are some issues to correctly upgrade the cpu type status.
The real value of the field on proxmox is not correctly detected and terraform is always trying to upgrade the cpu type.
thanks
- node decommissioned from puppet:
root@pergamon:~# /usr/local/sbin/swh-puppet-master-decommission boatbucket.internal.softwareheritage.org + puppet node deactivate boatbucket.internal.softwareheritage.org Submitted 'deactivate node' for boatbucket.internal.softwareheritage.org with UUID bf7bd0ea-f1ae-442f-b840-5bb1adb261f3 + puppet node clean boatbucket.internal.softwareheritage.org Notice: Revoked certificate with serial 224 Notice: Removing file Puppet::SSL::Certificate boatbucket.internal.softwareheritage.org at '/var/lib/puppet/ssl/ca/signed/boatbucket.internal.softwareheritage.org.pem' boatbucket.internal.softwareheritage.org + puppet cert clean boatbucket.internal.softwareheritage.org Warning: `puppet cert` is deprecated and will be removed in a future release. (location: /usr/lib/ruby/vendor_ruby/puppet/application.rb:370:in `run') Notice: Revoked certificate with serial 224 + systemctl restart apache2 root@pergamon:~# puppet agent --test
- server manually removed from proxmox / uffizi
home directories backuped:
root@boatbucket:/home# ls -d alphare/* alphare/.bash_history boatbucket/* boatbucket/.bash_history | xargs tar cvjf boatbucket-backup-2021-12-14.tar.bz2 ...
and saved on saam
root@boatbucket:/home# sudo -u boatbucket cp -v boatbucket-backup-2021-12-14.tar.bz /srv/boatbucket/ 'boatbucket-backup-2021-12-14.tar.bz' -> '/srv/boatbucket/boatbucket-backup-2021-12-14.tar.bz' root@boatbucket:/home# ls -al /srv/boatbucket/boatbucket-backup-2021-12-14.tar.bz -rw-r--r-- 1 boatbucket boatbucket 124170240 Dec 14 15:36 /srv/boatbucket/boatbucket-backup-2021-12-14.tar.bz root@boatbucket:/home# mount | grep boatbucket systemd-1 on /srv/boatbucket type autofs (rw,relatime,fd=54,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=13290) saam:/srv/storage/space/mirrors/boatbucket on /srv/boatbucket type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.100.107,local_lock=none,addr=192.168.100.109)
LGTM
not blocker, do we need the swh::postgresql::version indirection ?
The following minor postgresql upgrades will be performed during the upgrade:
- somerset: postgresql 13.4 -> 13.5 [1]
A dump/restore is not required for those running 13.X.
- belvedere:
- 11.14-0 -> 11.14-1 (indexer db)
- 12.8-1 -> 12.9-1 [2] (other dbs)
A dump/restore is not required for those running 12.X.
- db1:
- 12.8-1 -> 12.9-1 [2]