Page MenuHomeSoftware Heritage

De-baremetalify louvre
Closed, ResolvedPublic

Description

louvre currently performs the following tasks:

  1. hypervisor in the proxmox cluster, hosting a single container, uffizi
  2. main network interconnect between the internal VLAN440 and azure (as mentioned in https://forge.softwareheritage.org/T1526#28267).
  3. main openvpn server for admin access to the infrastructure
  4. backup centralization host (with nfs access to space on SESI's filer)
  5. main administration machine (with root SSH key and clustershell configuration)
  • The first task is just a remnant of this machine's historical function as our main hypervisor. There's no need to migrate it.
  • Tasks 2-3 are critical to the good operation of our infrastructure, but could be delegated to a VM (we don't really have a bare metal host to put them anyway).
  • Task 4 only centralizes a bunch of crontabs, which scp files from all the hosts to a NFS mount
  • Task 5 is just a "nice to have" and can easily be moved to another machine, e.g. pergamon which is already a sensitive host on the infra by means of being the puppet master.

The only critical part of these tasks is 2-3 and to some extent 4; 2-3 are attached to the network configuration of the host (all three ip addresses), and therefore are somewhat tricky functions to move. 4 is currently bound to the external ip address of louvre, but that could be changed by making a ticket to SESI asking access for another machine.

Event Timeline

olasd triaged this task as High priority.Jul 8 2019, 2:44 PM
olasd created this task.
olasd changed the task status from Open to Work in Progress.Jul 8 2019, 5:45 PM

I've done the first step of this, which is separating louvre from the rest of the proxmox cluster, following the instructions on https://pve.proxmox.com/wiki/Cluster_Manager#_remove_a_cluster_node.

olasd added a comment.Jul 8 2019, 5:58 PM

I've moved the clustershell setup to pergamon, by installing the clustershell command then copying the following files over:

/root/.ssh/id_rsa{,.pub}
/root/.ssh/config (for special ports)
/etc/clustershell/groups
olasd added a comment.Jul 9 2019, 6:29 PM

I've setup a new virtual machine (aptly called "louvre") and I've given it the following setup:

  • bare buster install, two network interfaces (private one with a temp ip address, public one unplugged)
  • openvpn installed and config copied over from the original louvre (/etc/openvpn)
    • regenerated larger dh parameters so it would start
  • strongswan installed and config copied over from louvre (/etc/ipsec.secrets and /etc/ipsec.conf)
  • network config copied over from louvre and trimmed down

Looks like both services start up properly (and hiccup because the public ip address isn't really working yet).

To actually switch network related stuff over:

  1. give current-louvre a new internal ip address
  2. stop openvpn and strongswan on current-louvre
  3. remove current-louvre's 100.1 and public ip addresses
  4. clean up current-louvre's firewall config
  5. add a new default route to 100.1 on current-louvre
  6. reboot new-louvre with its new network setup

This should only be a minute or two downtime.

olasd added a comment.Jul 9 2019, 7:03 PM
In T1895#35132, @olasd wrote:

I've setup a new virtual machine (aptly called "louvre") and I've given it the following setup:

  • bare buster install, two network interfaces (private one with a temp ip address, public one unplugged)
  • openvpn installed and config copied over from the original louvre (/etc/openvpn)
    • regenerated larger dh parameters so it would start
  • strongswan installed and config copied over from louvre (/etc/ipsec.secrets and /etc/ipsec.conf)
  • network config copied over from louvre and trimmed down

Looks like both services start up properly (and hiccup because the public ip address isn't really working yet).

To actually switch network related stuff over:

  1. give current-louvre a new internal ip address
  2. stop openvpn and strongswan on current-louvre
  3. remove current-louvre's 100.1 and public ip addresses
  4. clean up current-louvre's firewall config
  5. add a new default route to 100.1 on current-louvre
  6. reboot new-louvre with its new network setup

This should only be a minute or two downtime.

This has now happened. As it turns out, you still need to turn on the net.ipv4.ip_forward sysctl.

Original louvre now has ip address 192.168.100.3/24.

The last bit still on "louvre-bare-metal" is the backup centralization. I'm not quite sure where to move that yet (probably banco?).

For backup centralization, it can stay there for now I guess.

olasd added a comment.Jul 10 2019, 2:12 PM

For backup centralization, it can stay there for now I guess.

Well the thing is the bare metal machine's OS will be replaced by uffizi's os when T1894 happens, so we do need to move it somewhere else.

In T1895#35161, @olasd wrote:

For backup centralization, it can stay there for now I guess.

Well the thing is the bare metal machine's OS will be replaced by uffizi's os when T1894 happens, so we do need to move it somewhere else.

I was thinking of "moving" it like:

# cp -r /etc/backupstuff /path/to/chroot/etc
olasd added a comment.Jul 10 2019, 5:34 PM

So, moving it to uffizi then; got it :)

olasd closed this task as Resolved.Jul 10 2019, 6:41 PM

I've moved the backup crons to uffizi for now.

Louvre's / is mounted on uffizi:/mnt/louvre if people want to pull their data (for instance their shell history)

Louvre's / is mounted on uffizi:/mnt/louvre if people want to pull their data (for instance their shell history)

nice thinking thanks (i've rsync-ed it prior to seeing this ;)