Page MenuHomeSoftware Heritage

Upgrade proxmox hypervisor from version 6 to version 7 + debian11 migration
Closed, MigratedEdits Locked

Description

The nodes to upgrade [1]:

  • uffizi
  • pompidou
  • hypervisor3
  • beaubourg
  • branly

Plan

Looping on each hypervisor in the following order, we will mostly move or stop vms
elsewhere then revert the state after each upgrade to bullseye:

  • (for all) Pre-check: network interface configuration to avoid reboot caveats (no network at reboot)
  • 1. uffizi: sandbox hypervisor
    • vms to stop: all
  • 2. pompidou: it's staging infrastructure
    • vms to move:
      • staging workers
      • workers[13-16] -> vms moved without stopping just restarted once moved to uffizi -> so in the end, stopped the vm properly prior to migrate to uffizi
  • 3. hypervisor3:
    • stopping vms:
      • webapp1
      • louvre
      • pushkin (secondary firewall)
      • workers[01-08]
    • moving some vms to branly (same archi amd):
      • logstash0
      • bardo (to ensure front-end service for hedgedoc)
      • tate
  • 4. beaubourg: only somerset (secondary db) is running there and the archive webapp is currently using belvedere db (primary) so it can be stopped without impacting the archive)
  • 5. branly:
    • moving vms to hypervisor3 (same archi amd):
      • pergamon
      • moma
      • saatchi
      • thyssen
      • counters1
      • search1
      • kelvingrove (keycloak)
    • stopping and moving to another hypervisor:
      • riverside (sentry)
      • rp1 (rp for hedgedoc)
    • stopping other vms:
      • glyptotek: primary firewall (switch it as secondary first during the duration of the upgrade)
      • workers[09-12]
      • getty
      • kibana0
      • bojimans (netbox)
      • jenkins-debian1

[1] https://inventory.internal.softwareheritage.org/dcim/devices/?cluster_id=4

[2] Highlighted doc from irc discussion:

[3] https://pve.proxmox.com/wiki/Ceph_Nautilus_to_Octopus

Event Timeline

ardumont triaged this task as Normal priority.Dec 6 2021, 2:32 PM
ardumont created this task.

Preconditions checklist from the proxmox upgrade guide:

  • Upgraded to the latest version of Proxmox VE 6.4 (check correct package repository configuration)

On all nodes:

root@pergamon:/etc/clustershell# clush -b -w @hypervisors "pveversion"
---------------
branly,pompidou,uffizi (3)
---------------
pve-manager/6.4-13/9f411e79 (running kernel: 5.4.103-1-pve)
---------------
beaubourg
---------------
pve-manager/6.4-13/9f411e79 (running kernel: 5.4.143-1-pve)
---------------
hypervisor3
---------------
pve-manager/6.4-13/9f411e79 (running kernel: 5.4.128-1-pve)
  • TODO Hyper-converged Ceph: upgrade the Ceph Nautilus cluster to Ceph 15.2 Octopus before you start the Proxmox VE upgrade to 7.0. Follow the guide Ceph Nautilus to Octopus
  • No backup server Co-installed Proxmox Backup Server: see the Proxmox Backup Server 1.1 to 2.x upgrade how-to
  • Reliable access to the node (through ssh, iKVM/IPMI or physical access)
  • A healthy cluster
  • Valid and tested backup of all VMs and CTs (in case something goes wrong) At least 4 GiB free disk space on the root mount point.
  • Check known upgrade issues
  • from later on the doc Test the pve6to7 migration checklist

Output of the pve6to7 script on uffizi:

root@uffizi:~# pve6to7
= CHECKING VERSION INFORMATION FOR PVE PACKAGES =

Checking for package updates..
PASS: all packages uptodate

Checking proxmox-ve package version..
PASS: proxmox-ve package has version >= 6.4-1

Checking running kernel version..
PASS: expected running kernel '5.4.103-1-pve'.

= CHECKING CLUSTER HEALTH/SETTINGS =

PASS: systemd unit 'pve-cluster.service' is in state 'active'
PASS: systemd unit 'corosync.service' is in state 'active'
PASS: Cluster Filesystem is quorate.

Analzying quorum settings and state..
INFO: configured votes - nodes: 5
INFO: configured votes - qdevice: 0
INFO: current expected votes: 5
INFO: current total votes: 5

Checking nodelist entries..
PASS: nodelist settings OK

Checking totem settings..
PASS: totem settings OK

INFO: run 'pvecm status' to get detailed cluster status..

= CHECKING HYPER-CONVERGED CEPH STATUS =

INFO: hyper-converged ceph setup detected!
INFO: getting Ceph status/health information..
PASS: Ceph health reported as 'HEALTH_OK'.
INFO: getting Ceph daemon versions..
PASS: single running version detected for daemon type monitor.
PASS: single running version detected for daemon type manager.
PASS: single running version detected for daemon type MDS.
PASS: single running version detected for daemon type OSD.
PASS: single running overall version detected for all Ceph daemon types.
WARN: 'noout' flag not set - recommended to prevent rebalancing during cluster-wide upgrades.
INFO: checking Ceph config..
FAIL: local Ceph version too low, at least Octopus required..

= CHECKING CONFIGURED STORAGES =

PASS: storage 'local' enabled and active.
PASS: storage 'proxmox' enabled and active.
PASS: storage 'proxmox-cephfs' enabled and active.
SKIP: storage 'scratch' disabled.

= MISCELLANEOUS CHECKS =

INFO: Checking common daemon services..
PASS: systemd unit 'pveproxy.service' is in state 'active'
PASS: systemd unit 'pvedaemon.service' is in state 'active'
PASS: systemd unit 'pvestatd.service' is in state 'active'
INFO: Checking for running guests..
WARN: 7 running guest(s) detected - consider migrating or stopping them.
INFO: Checking if the local node's hostname 'uffizi' is resolvable..
INFO: Checking if resolved IP is configured on local node..
PASS: Resolved node IP '192.168.100.101' configured and active on single interface.
INFO: Checking backup retention settings..
WARN: storage 'proxmox-cephfs' - parameter 'maxfiles' is deprecated with PVE 7.x and will be removed in a future version, use 'prune-backups' instead.
INFO: checking CIFS credential location..
PASS: no CIFS credentials at outdated location found.
INFO: Checking custom roles for pool permissions..
INFO: Checking node and guest description/note legnth..
PASS: All node config descriptions fit in the new limit of 64 KiB
PASS: All guest config descriptions fit in the new limit of 8 KiB
INFO: Checking container configs for deprecated lxc.cgroup entries
PASS: No legacy 'lxc.cgroup' keys found.
INFO: Checking storage content type configuration..
PASS: no problems found
INFO: Checking if the suite for the Debian security repository is correct..
INFO: Make sure to change the suite of the Debian security repository from 'buster/updates' to 'bullseye-security' - in /etc/apt/sources.list.d/debian-security.list:3
SKIP: NOTE: Expensive checks, like CT cgroupv2 compat, not performed without '--full' parameter

= SUMMARY =

TOTAL:    32
PASSED:   26
SKIPPED:  2
WARNINGS: 3
FAILURES: 1

ATTENTION: Please check the output for detailed information!
Try to solve the problems one at a time and then run this checklist tool again.

Only one expected failure relative to the ceph version

vsellier renamed this task from Migrate proxmox hypervisor nodes to bullseye to Upgrade proxmox hypervisor from version 6 to version 7.Dec 9 2021, 4:54 PM
vsellier updated the task description. (Show Details)
vsellier updated the task description. (Show Details)
ardumont updated the task description. (Show Details)
vsellier claimed this task.

All the hypervisors are migrated and the services restored

vsellier renamed this task from Upgrade proxmox hypervisor from version 6 to version 7 to Upgrade proxmox hypervisor from version 6 to version 7 + debian11 migration.Dec 10 2021, 5:21 PM