Page MenuHomeSoftware Heritage

Upgrade kafka nodes to bullseye
Closed, MigratedEdits Locked

Description

Upgrade the kafa nodes:

  • journal0.internal.staging.swh.network (staging's kafka was migrated to storage1, which can't be upgraded to bullseye as swh-storage and swh-objstorage are deployed on it)
  • kafka[1-4].internal.softwareheritage.org

Upgrade plan:

  • disable puppet
  • upgrade buster packages
  • stop and disable kafka
  • stop and disable zookeeper
  • reboot if needed
  • upgrade to bullseye following the plan described in T3749
  • Try to reboot, if the network is not working
    • edit the /etc/network/interfaces file and comment the physical interface declarations
auto enp94s0f0np0
iface enp94s0f0np0 inet manual
  mtu 9000
# interface enp94s0f1np1 configuration
auto enp94s0f1np1
iface enp94s0f1np1 inet manual
  mtu 9000
    • Remove the interface configuration on puppet (kafka[1-4].internal.softwareheritage.org` files) before restarting puppet
  • restart and check zookeeper status
  • restart and check kafka status (wait for the cluster to recover[1])
  • re-enable puppet

[1] http://getty.internal.softwareheritage.org:9000/clusters/rocquencourt/topics

Event Timeline

vsellier changed the task status from Open to Work in Progress.Dec 2 2021, 10:13 AM
vsellier triaged this task as Normal priority.
vsellier created this task.

Actual actions

As root user on kafka1:

# disable puppet
puppet agent --disable 'T3762'
# upgrade buster packages
apt update
apt upgrade -y
# stop kafka
systemctl stop kafka.service; systemctl disable kafka.service
# stop zookeeper
systemctl stop zookeeper.service; systemctl disable zookeeper.service
# status before reboot
uname -a
Linux kafka1 5.10.0-0.bpo.3-amd64 #1 SMP Debian 5.10.13-1~bpo10+1 (2021-02-11) x86_64 GNU/Linux
uptime
14:56:00 up 258 days, 21:33,  5 users,  load average: 1.05, 7.36, 7.38

reboot

After reboot, let's do the actual upgrade:

sed -i -e 's/buster/bullseye/' /etc/apt/sources.list.d/*
sed -i -e 's,bullseye/updates,bullseye-security,' /etc/apt/sources.list.d/debian-security.list
apt update
apt upgrade -y
apt dist-upgrade -y
uname -a
Linux kafka1 5.10.0-0.bpo.9-amd64 #1 SMP Debian 5.10.70-1~bpo10+1 (2021-10-10) x86_64 GNU/Linux
uptime
 15:10:31 up 12 min,  5 users,  load average: 4.50, 2.28, 1.06
reboot

As kafka machines are actual hardware, connected through the idrac just in case
something goes wrong during reboot.

The network interfaces are not up at reboot.
We need to adapt them a bit.

After some adaptation in the network interface ^, the network is back at reboot.
Activated back zookeeper and kafka and everything is ok for kafka1.
Inventory updated for that node from buster to bullseye.

*next step*: migrate remaining kafka[2-4]

kafka2 migrated as well, services up, inventory updated.

*next step*: migrate remaining kafka[3-4]

kafka3 migrated as well, services up, inventory updated.

*next step*: migrate remaining kafka4

kafka4 migrated as well, services up, inventory updated.

We are done ;)

ardumont claimed this task.