Page MenuHomeSoftware Heritage

Add a new hypervisor
Closed, ResolvedPublic

Description

Louvre is no longer able to run all existing VMs.
A third hypervisor has to be added to our infrastructure in order to continue to operate smoothly when one of them is out of service.

Event Timeline

ftigeot changed the task status from Open to Work in Progress.Nov 27 2018, 4:42 PM
ftigeot triaged this task as Normal priority.
ftigeot created this task.

New hypervisor hardware has been racked in our bay at Rocquencourt.
The machine's iDrac management interface is accessible on the management network, under the name swh7-adm.inria.fr (details on the wiki).

Proxmox now installed on the machine, hypervisor3.softwareheritage.org.

olasd added a subscriber: olasd.Jan 2 2019, 6:44 PM

I've reinstalled the machine following these steps:

  • Debian installed on the machine with the plain debian installer (no bonding support => no network)
  • network configured with iproute2 (with the bond + vlan stack)
  • install facter from stretch backports, install puppet from stretch
  • /etc/facter/facts.d seeded
  • puppet run
  • install ifenslave, bridge-utils and vlan to get the proper /etc/network/interfaces scripts
  • write proper /etc/network/interfaces with inspiration from louvre
  • reboot and hope the network comes up. rinse, repeat.
  • setup proxmox as advised in https://pve.proxmox.com/wiki/Install_Proxmox_VE_on_Debian_Stretch
  • reboot on the proxmox kernel
  • Update /etc/pve/corosync.conf with the new host on one of the existing hypervisors
    • add node to nodelist with a new nodeid
    • add unicast ip address to ring0 in the totem section
    • increment config version number in the totem section
  • restart pve-cluster on existing nodes to take the new corosync config into account
  • follow https://pve.proxmox.com/wiki/Cluster_Manager#pvecm_join_node_to_cluster
    • pvecm add 192.168.100.1
    • type root password for louvre
  • see the host appear in the proxmox cluster
  • reboot one last time for good measure (and to see whether everything starts ok on boot)
ftigeot changed the status of subtask T1467: Slow network transfers from beaubourg from Open to Work in Progress.Jan 14 2019, 1:23 PM
ftigeot closed this task as Resolved.Apr 15 2019, 4:51 PM

The new hypervisor has been working without any particular issue since its installation.