- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Dec 16 2021
LGTM thanks
Dec 15 2021
Thanks, it will be very useful ;)
fix a type on the commit message
proxmox provider updated to v2.9.3
Remove last references to storage_type
After some adapations, the syntax is now good.
Dec 14 2021
there are some issues to correctly upgrade the cpu type status.
The real value of the field on proxmox is not correctly detected and terraform is always trying to upgrade the cpu type.
thanks
- node decommissioned from puppet:
root@pergamon:~# /usr/local/sbin/swh-puppet-master-decommission boatbucket.internal.softwareheritage.org + puppet node deactivate boatbucket.internal.softwareheritage.org Submitted 'deactivate node' for boatbucket.internal.softwareheritage.org with UUID bf7bd0ea-f1ae-442f-b840-5bb1adb261f3 + puppet node clean boatbucket.internal.softwareheritage.org Notice: Revoked certificate with serial 224 Notice: Removing file Puppet::SSL::Certificate boatbucket.internal.softwareheritage.org at '/var/lib/puppet/ssl/ca/signed/boatbucket.internal.softwareheritage.org.pem' boatbucket.internal.softwareheritage.org + puppet cert clean boatbucket.internal.softwareheritage.org Warning: `puppet cert` is deprecated and will be removed in a future release. (location: /usr/lib/ruby/vendor_ruby/puppet/application.rb:370:in `run') Notice: Revoked certificate with serial 224 + systemctl restart apache2 root@pergamon:~# puppet agent --test
- server manually removed from proxmox / uffizi
home directories backuped:
root@boatbucket:/home# ls -d alphare/* alphare/.bash_history boatbucket/* boatbucket/.bash_history | xargs tar cvjf boatbucket-backup-2021-12-14.tar.bz2 ...
and saved on saam
root@boatbucket:/home# sudo -u boatbucket cp -v boatbucket-backup-2021-12-14.tar.bz /srv/boatbucket/ 'boatbucket-backup-2021-12-14.tar.bz' -> '/srv/boatbucket/boatbucket-backup-2021-12-14.tar.bz' root@boatbucket:/home# ls -al /srv/boatbucket/boatbucket-backup-2021-12-14.tar.bz -rw-r--r-- 1 boatbucket boatbucket 124170240 Dec 14 15:36 /srv/boatbucket/boatbucket-backup-2021-12-14.tar.bz root@boatbucket:/home# mount | grep boatbucket systemd-1 on /srv/boatbucket type autofs (rw,relatime,fd=54,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=13290) saam:/srv/storage/space/mirrors/boatbucket on /srv/boatbucket type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.100.107,local_lock=none,addr=192.168.100.109)
LGTM
not blocker, do we need the swh::postgresql::version indirection ?
The following minor postgresql upgrades will be performed during the upgrade:
- somerset: postgresql 13.4 -> 13.5 [1]
A dump/restore is not required for those running 13.X.
- belvedere:
- 11.14-0 -> 11.14-1 (indexer db)
- 12.8-1 -> 12.9-1 [2] (other dbs)
A dump/restore is not required for those running 12.X.
- db1:
- 12.8-1 -> 12.9-1 [2]
It seems since the 8th of december, there were no requests > 1s in the builds
I will monitor it during the current week, if it not occurs again, I will change the status to resolved
Dec 13 2021
upgrade done following the T3799 procedure.
After several tests in vagrant, the upgrade looks ok, even if I couldn't succeed to have a complete local dns environment.
LGTM thanks
If not defined, this variable is set by the elasticsearch launch script https://github.com/elastic/elasticsearch/pull/80699/files#diff-ddfc3a6ea1404997e56f2e771adede06b173f0fea37b4779d827c85d6cc52897R35
I guess as the fixture is not starting elasticsearch[1] throught the startup script, the variable is not defined
olasd: I transfer you the ownership of this task as you manage the subject. Feel free to close the task if the installation can be considered as done.
Dec 10 2021
All the hypervisors are migrated and the services restored
root@pergamon:/usr/local/sbin# ./swh-puppet-master-decommission louvre.internal.softwareheritage.org + puppet node deactivate louvre.internal.softwareheritage.org Submitted 'deactivate node' for louvre.internal.softwareheritage.org with UUID edca37d0-0976-4598-aadd-aef13a033a34 + puppet node clean louvre.internal.softwareheritage.org Notice: Revoked certificate with serial 156 Notice: Removing file Puppet::SSL::Certificate louvre.internal.softwareheritage.org at '/var/lib/puppet/ssl/ca/signed/louvre.internal.softwareheritage.org.pem' louvre.internal.softwareheritage.org + puppet cert clean louvre.internal.softwareheritage.org Warning: `puppet cert` is deprecated and will be removed in a future release. (location: /usr/lib/ruby/vendor_ruby/puppet/application.rb:370:in `run') Notice: Revoked certificate with serial 156 + systemctl restart apache2
- vm 108 removed
The ceph packages need to be also updated on the proxmox nodes even if they are not in the ceph cluster (from the output of pve6to7)
Dec 9 2021
it's good for me to close it.
No requests took more than 1s during the last build this night.
I will continue to monitor the builds and try to diagnose the problem more accurately
a couple of remarks. sorry in advance if it's just because it's a bootstrap and everything is not yet finalized
Output of the pve6to7 script on uffizi:
Preconditions checklist from the proxmox upgrade guide:
- Upgraded to the latest version of Proxmox VE 6.4 (check correct package repository configuration)
On all nodes:
root@pergamon:/etc/clustershell# clush -b -w @hypervisors "pveversion" --------------- branly,pompidou,uffizi (3) --------------- pve-manager/6.4-13/9f411e79 (running kernel: 5.4.103-1-pve) --------------- beaubourg --------------- pve-manager/6.4-13/9f411e79 (running kernel: 5.4.143-1-pve) --------------- hypervisor3 --------------- pve-manager/6.4-13/9f411e79 (running kernel: 5.4.128-1-pve)
- TODO Hyper-converged Ceph: upgrade the Ceph Nautilus cluster to Ceph 15.2 Octopus before you start the Proxmox VE upgrade to 7.0. Follow the guide Ceph Nautilus to Octopus
- No backup server Co-installed Proxmox Backup Server: see the Proxmox Backup Server 1.1 to 2.x upgrade how-to
- Reliable access to the node (through ssh, iKVM/IPMI or physical access)
- A healthy cluster
- Valid and tested backup of all VMs and CTs (in case something goes wrong) At least 4 GiB free disk space on the root mount point.
- Check known upgrade issues
- from later on the doc Test the pve6to7 migration checklist
Dec 8 2021
WDYT to put this value in production/common.yaml to also align webapp1 with this value ?
The lister was fixed with the deployment of the swh-scheduler v0.22.0.
deployment of version v0.22.0 in production
Deployment of the version v0.22.0 in staging
The timeout occurs after 1s on the swh-web side on a directory/ls call.
04:11:10 nginx_1 | 172.23.0.1 - - [08/Dec/2021:03:11:09 +0000] "GET /api/1/directory/877df54c7dda406e9ad56ca09f793799aedbb26b/ HTTP/1.1" 500 4996 "-" "curl/7.64.0" 1.013
Dec 7 2021
Thanks
More info here: https://www.jenkins.io/doc/book/managing/built-in-node-migration/
the last builds were successful and are not indicating any response time too long.
let's see tomorrow if the response times are slower at the usual build time.