Page MenuHomeSoftware Heritage

Use local hypervisor storage in the loader pods
Closed, MigratedEdits Locked

Description

The loader pods should use local hypervisor storage to avoid an unnecessary load on ceph when the data is offloaded on the disk during the parsing of big repositories.

Currently, on the current "classical" workers, the temporary file is written in a memory fs and the servers are configured to
swap on a partition hosted on the local storage.

The same mechanism can't be used for the kubernetes infrastructure as the swap must be disabled for the
nodes themselves and the notion of swap doesn't exist in a pod.

Event Timeline

vsellier triaged this task as High priority.Sep 7 2022, 6:19 PM
vsellier created this task.
vsellier updated the task description. (Show Details)
vsellier updated the task description. (Show Details)
vsellier changed the task status from Open to Work in Progress.Sep 14 2022, 11:01 AM
vsellier claimed this task.
vsellier moved this task from Backlog to in-progress on the System administration board.

In order to test the local storage on nodes declared on uffizi, I configured a new scratch storage on this hypervisor.
Following T3707#73522 and https://pve.proxmox.com/wiki/Storage:_LVM_Thin

root@uffizi:~# lvcreate -L200G -n proxmox-scratch vg-louvre
  Logical volume "scratch" created.

root@uffizi:~# lvconvert --type thin-pool /dev/vg-louvre/proxmox-scratch
  Thin pool volume with chunk size 128.00 KiB can address at most 31.62 TiB of data.
  WARNING: Converting vg-louvre/proxmox-scratch to thin pool's data volume with metadata wiping.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
Do you really want to convert vg-louvre/proxmox-scratch? [y/n]: y
  Converted vg-louvre/proxmox-scratch to thin pool.

Unfortunately, uffizi can't be added on the scratch storage pool with the other nodes (Datacenter / Storage / Scratch -> Edit)
because the lvm configuration is not the same (no dedicated scratch vg) so I finally created a new scratch volume with only
uffizi inside.
The thin vg was not detected in the interface so I manually declared the storage in in /etc/pve/storage.cfg:

root@uffizi:/etc/pve# diff -U3 ~/storage.cfg storage.cfg
--- /root/storage.cfg	2022-09-14 11:03:55.444277955 +0000
+++ storage.cfg	2022-09-14 11:05:03.000000000 +0000
@@ -20,3 +20,9 @@
 	content images,rootdir
 	nodes hypervisor3,pompidou,branly

+lvmthin: uffizi-scratch
+	thinpool proxmox-scratch
+	vgname vg-louvre
+	content images,rootdir
+	nodes uffizi
+

and everything was fine.

Uffizi has 200G allocated lv allocated to the unused local storage so if we need more space, we can reduce the size of this logical volume

rancher seems to create emptydir volume in /var/lib/kubelet, except the /var/lib/kubelet/pki directory, everything is ephemeral in this directory so we could easily use a partition backed by a local storage disk.
It will also remove an unecessary pressure on ceph for the pod relative data.
The /var/lib/docker directory could also be moved to this local partition as everything in docker can be lost.
I will manually try that on one staging node to check if it can work before changing the terraform / puppet code

It works \o/

on rancher-node-staging-worker2,

  • move the second disk to the uffizi local storage pool (uffizi-scratch)
  • create a new zfs dataset on the data pool
root@rancher-node-staging-worker2:/var/lib# mv kubelet kubelete-save
root@rancher-node-staging-worker2:/var/lib# zfs create -o mountpoint=/var/lib/kubelet -o atime=off -o relatime=on -o compression=zstd data/kubelet
  • restart rancher and uncordon the node
  • create a container with a emptydir volume mounted on /tmp (P1453)
  • generate some write load on the /tmp directory of the container
  • the write activity is visible on uffizi (and not on the network as before)

The kubelet dataset will need to be manually created on all the rancher nodes (except staging worker2 and worker3 already configured) before applying D8482

  • cluster-argo
  • archive-staging
  • archive-production

Example during the loading of https://github.com/torvalds/linux by a pod:

 % /usr/sbin/zfs list data/docker data/kubelet
NAME           USED  AVAIL     REFER  MOUNTPOINT
data/docker   3.81G  40.4G     83.2M  /var/lib/docker
data/kubelet  3.71G  40.4G     3.71G  /var/lib/kubelet

The compression is not as useful as for docker

 % /usr/sbin/zfs get compressratio data/kubelet data/docker 
NAME          PROPERTY       VALUE  SOURCE
data/docker   compressratio  2.95x  -
data/kubelet  compressratio  1.07x  -
vsellier moved this task from in-progress to done on the System administration board.