Page MenuHomeSoftware Heritage

POC elastic worker infrastructure
Started, Work in Progress, NormalPublic

Description

In order to validate the feasibility and the possible caveats to implements an elastic
workers infrastructure, we will implement a poc managing the workers for the gitlab
repositories

  • First we need to refresh and land the kubernetes branch on the swh-environment to have a working example
  • Refresh the rancher VM on uffizi to test the solution in a pseudo real environment (created from scratch, cf. terraform/staging)
  • Create workers and register them in the rancher cluster
  • POC image building / deployment process (manual push on docker hub)
  • POC worker autoscaling according to message in queues
  • POC worker autoscaling according to available ressources on the cluster
  • cluster / elastic workers monitoring (nb of running workers, statsd, ...)
  • Plug standard log ingestion with elastic workers

[1] Draft note on https://hedgedoc.softwareheritage.org/4ZHT03kRT7mHYOEm1MSueQ

Event Timeline

vsellier triaged this task as Normal priority.Sep 21 2021, 8:29 AM
vsellier created this task.
ardumont changed the task status from Open to Work in Progress.Sep 22 2021, 2:57 PM
ardumont moved this task from Backlog to Weekly backlog on the System administration board.
ardumont moved this task from Weekly backlog to in-progress on the System administration board.

Interesting documentations on how to manage jobs:

  • bootstrap the poc-rancher vm
  • install docker.io (from debian) on it (technical requirements to run rancher [1])
  • install rancher with docker (it's a poc) [2]
root@poc-rancher:~# docker run -d --restart=unless-stopped -p 80:80 -p 443:443 --privileged --name rancher rancher/rancher
root@poc-rancher:~# docker ps rancher
...

this refuses to start.

  • trying out to install k3s first
root@poc-rancher:~# curl -sfL https://get.k3s.io | sh -
root@poc-rancher:~# systemctl status k3s | grep Active
     Active: active (running) since Wed 2021-09-22 14:03:13 UTC; 30s ago

root@poc-rancher:~# curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
root@poc-rancher:~# chmod 700
root@poc-rancher:~# chmod 700 get_helm.sh
root@poc-rancher:~# less get_helm.sh
root@poc-rancher:~# ./get_helm.sh
Downloading https://get.helm.sh/helm-v3.7.0-linux-amd64.tar.gz
Verifying checksum... Done.
Preparing to install helm into /usr/local/bin
root@poc-rancher:~# which helm
/usr/local/bin/helm
root@poc-rancher:~# helm repo add rancher-stable https://releases.rancher.com/server-charts/stable
"rancher-stable" has been added to your repositories
root@poc-rancher:~# helm repo list
NAME            URL
rancher-stable  https://releases.rancher.com/server-charts/stable
root@poc-rancher:~# kubectl create namespace cattle-system
namespace/cattle-system created
root@poc-rancher:~# helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "jetstack" chart repository
...Successfully got an update from the "rancher-stable" chart repository
Update Complete. ⎈Happy Helming!⎈
# Need some setup for the kube user (root here)
root@poc-rancher:~/.kube# cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
root@poc-rancher:~# kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.5.3/cert-manager.crds.yaml
customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io configured
customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io configured
customresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io configured
customresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io configured
customresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io configured
customresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io configured
root@poc-rancher:~# helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.5.3
NAME: cert-manager
LAST DEPLOYED: Wed Sep 22 14:19:59 2021
NAMESPACE: cert-manager
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
cert-manager v1.5.3 has been deployed successfully!

In order to begin issuing certificates, you will need to set up a ClusterIssuer
or Issuer resource (for example, by creating a 'letsencrypt-staging' issuer).

More information on the different types of issuers and how to configure them
can be found in our documentation:

https://cert-manager.io/docs/configuration/

For information on how to configure cert-manager to automatically provision
Certificates for Ingress resources, take a look at the `ingress-shim`
documentation:

https://cert-manager.io/docs/usage/ingress/
root@poc-rancher:~# kubectl get pods --namespace cert-manager
NAME                                       READY   STATUS    RESTARTS   AGE
cert-manager-cainjector-856d4df858-5xqcl   1/1     Running   0          6m57s
cert-manager-66b6d6bf59-59wqv              1/1     Running   0          6m57s
cert-manager-webhook-5fd7d458f7-kmqhq      1/1     Running   0          6m57s
root@poc-rancher:~# helm install rancher rancher-stable/rancher \
  --namespace cattle-system \
  --set hostname=poc-rancher.internal.staging.swh.network \
  --set bootstrapPassword=<redacted>
W0922 14:31:40.931662 48675 warnings.go:70] cert-manager.io/v1beta1 Issuer is deprecated
in v1.4+, unavailable in v1.6+; use cert-manager.io/v1 Issuer
NAME: rancher
LAST DEPLOYED: Wed Sep 22 14:31:40 2021
NAMESPACE: cattle-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Rancher Server has been installed.

NOTE: Rancher may take several minutes to fully initialize. Please standby while
Certificates are being issued and Ingress comes up.

Check out our docs at https://rancher.com/docs/rancher/v2.x/en/

Browse to https://poc-rancher.internal.staging.swh.network

Happy Containering!
root@poc-rancher:~# kubectl -n cattle-system rollout status deploy/rancher
Waiting for deployment "rancher" rollout to finish: 0 of 3 updated replicas are available...
Waiting for deployment "rancher" rollout to finish: 1 of 3 updated replicas are available...
Waiting for deployment "rancher" rollout to finish: 2 of 3 updated replicas are available...
Waiting for deployment spec update to be observed...
Waiting for deployment "rancher" rollout to finish: 2 of 3 updated replicas are available...
deployment "rancher" successfully rolled out
root@poc-rancher-sw{0,1):~# docker run -d --privileged --restart=unless-stopped --net=host -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run  rancher/rancher-agent:v2.5.9 --server https://poc-rancher.internal.staging.swh.network --token <redacted> --ca-checksum <redacted> --etcd --controlplane --worker

[1] https://rancher.com/docs/rancher/v2.6/en/

[2] https://rancher.com/docs/rancher/v2.6/en/quick-start-guide/deployment/quickstart-manual-setup/

[3] https://docs.docker.com/engine/install/debian/

After having hard time, we have solved several issues:

  • The rancher initialization problem was because we were using a wrong version of k3s compared to the compatibility matrix of rancher.

We installed rancher 2.5.9 on a recent version of k3s installing kubernetes 1.22.2. According to the compatibility matrix of rancher[1], using a older version of k3s solved the problem and the clusters start correctly after that

After solving the rancher issue, we faced another issue with internode communication.
2 nodes on the cluster were unable to talk together. It's not really a problem for the workers as they don't need to communicate with other nodes in the cluster, but it often blocks the dns resolution on the pods because the dns resolvers are deployed with a daemonset and dispatched on several nodes [2]

A standalone k3s cluster also has the problem so it's not a rancher issue.

With 2 ubuntu vms, everything is working well so it's a compatibility issue with debian.

Output of the self check of k3s

root@poc-rancher-sw0:~# k3s check-config

Verifying binaries in /var/lib/rancher/k3s/data/bc7e12eeb86c005efd104d57a8733f0e9fbf2ede3571bad3da2cd0326b978aed/bin:
- sha256sum: good
- links: good

System:
- /usr/sbin iptables v1.8.2 (nf_tables): should be older than v1.8.0, newer than v1.8.3, or in legacy mode (fail)
- swap: should be disabled
- routes: ok

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

modprobe: FATAL: Module configs not found in directory /lib/modules/5.10.0-0.bpo.8-amd64
info: reading kernel config from /boot/config-5.10.0-0.bpo.8-amd64 ...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- /usr/sbin/apparmor_parser
apparmor: enabled and tools installed
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled (as module)
- CONFIG_BRIDGE: enabled (as module)
- CONFIG_BRIDGE_NETFILTER: enabled (as module)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled (as module)
- CONFIG_IP_NF_NAT: enabled (as module)
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_POSIX_MQUEUE: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_CGROUP_PIDS: enabled
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: enabled
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: enabled
- CONFIG_NET_CLS_CGROUP: enabled (as module)
- CONFIG_CGROUP_NET_PRIO: enabled
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: missing
- CONFIG_IP_NF_TARGET_REDIRECT: enabled (as module)
- CONFIG_IP_SET: enabled (as module)
- CONFIG_IP_VS: enabled (as module)
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_PROTO_TCP: enabled
- CONFIG_IP_VS_PROTO_UDP: enabled
- CONFIG_IP_VS_RR: enabled (as module)
- CONFIG_EXT4_FS: enabled (as module)
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled (as module)
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled (as module)
      - CONFIG_CRYPTO_GCM: enabled (as module)
      - CONFIG_CRYPTO_SEQIV: enabled (as module)
      - CONFIG_CRYPTO_GHASH: enabled (as module)
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled (as module)
      - CONFIG_XFRM_ALGO: enabled (as module)
      - CONFIG_INET_ESP: enabled (as module)
      - CONFIG_INET_XFRM_MODE_TRANSPORT: missing
- Storage Drivers:
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled (as module)

STATUS: 1 (fail)

It indicates an issue with the iptables version:

- /usr/sbin iptables v1.8.2 (nf_tables): should be older than v1.8.0, newer than v1.8.3, or in legacy mode (fail)

Switching to the old legacy version made him happy

update-alternatives --set iptables /usr/sbin/iptables-legacy
update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
root@poc-rancher-sw0:~# k3s check-config
...
System:
- /usr/sbin iptables v1.8.2 (legacy): ok
...
STATUS: pass

Unfortunately, the nodes were still not able to communicate each others

A lot of network issues seems to be be related to the support vxlan on debian / flannel[3]
Trying to change the flannel backend to host-gw (on the k3s server only, not the agents) as indicated finally solve the problem and the node are able to communicate:

root@poc-rancher-sw0:/etc/systemd/system# diff -U3 /tmp/k3s.service k3s.service
--- /tmp/k3s.service	2021-09-29 08:39:37.628970457 +0000
+++ k3s.service	2021-09-29 08:39:46.249302142 +0000
@@ -28,4 +28,5 @@
 ExecStartPre=-/sbin/modprobe overlay
 ExecStart=/usr/local/bin/k3s \
     server \
+    --flannel-backend host-gw \
root@poc-rancher-sw0:~# ./test-network.sh 
=> Start network overlay test
poc-rancher-sw1 can reach poc-rancher-sw1
poc-rancher-sw1 can reach poc-rancher-sw0
poc-rancher-sw0 can reach poc-rancher-sw1
poc-rancher-sw0 can reach poc-rancher-sw0
=> End network overlay test

\o/

[1] https://rancher.com/support-maintenance-terms/all-supported-versions/rancher-v2.5.9/
[2] the communication test was done as explained on https://rancher.com/docs/rancher/v2.5/en/troubleshooting/networking/
[3] https://github.com/k3s-io/k3s/issues/3624 / https://github.com/k3s-io/k3s/issues/3863 / ...

2021-11-15 update* The rancher version 2.6.2 should include a fix for the network issue with debian but the tests are still not conclusive

Intermediary status:

  • We have successfully ran loaders in staging using the helm chart we have wrote [1] and an hardcoded number of worker, It adds the possibility to perform rolling upgrades for example
  • We have tried the integrated horizontal pod autoscaler [2], it works pretty well but it's not adapted for our worker scenario. It's based on the cpu consumption(on our test [3], but can be other things) of the pod to decide if the number of running pods must be upscaled or downscaled. It can be very useful to manage classical load like for gunicorn container, but not for the scenario of long running tasks
  • Kubernetes also has some functionalities to reduce the pressure on a node when some limts are reached but it looks like it's more emergency actions than proper scaling management. It's configured at the kubelet level and not dynamic at all [4]. It was rapidly tested but we have lost the node due to oom before the node eviction starts.

There are a lot of stuff we can also test like (no exhaustive):

  • trying to write an operator monitoring the overall cluster load and adapting the paralellism. the hard part is to find a way to identify the instances to stop
  • test third party tools like keda[5]

The log and monitoring part was not yet digged

[1] https://forge.softwareheritage.org/source/snippets/browse/master/sysadmin/T3592-elastic-workers/worker/
[2] https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/#autoscaling-on-multiple-metrics-and-custom-metrics
[3] https://forge.softwareheritage.org/source/snippets/browse/master/sysadmin/T3592-elastic-workers/worker/templates/autoscale.yaml
[4] https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/
[5] https://keda.sh/docs/2.4/concepts/scaling-jobs/#overview

keda looks promising. P1193 is an example of configuration working for the docker environment. It's able to scale to 0 when no messages are present on the queue.
When messages are present, the loaders are launched progressively until the limit of cpu/memory of the host is reached or the max number of allowed worker is reached.

just a quick remark about the scheduling of (sub)tasks of this task: IMHO the autoscaling should come last; all the supervision/monitoring/logging related tasks are much more important than the autoscaling.