POC elastic worker infrastructure
Closed, MigratedEdits Locked
Actions

Assigned To

Authored By

	vsellier
	Sep 21 2021, 8:29 AM

Description

In order to validate the feasibility and the possible caveats to implements an elastic
workers infrastructure, we will implement a poc managing the workers for the gitlab
repositories

First we need to refresh and land the kubernetes branch on the swh-environment to have a working example
Refresh the rancher VM on uffizi to test the solution in a pseudo real environment (created from scratch, cf. terraform/staging)
Create workers and register them in the rancher cluster
POC image building / deployment process (manual push on docker hub)
POC worker autoscaling according to message in queues
POC worker autoscaling according to available ressources on the cluster
cluster / elastic workers monitoring (nb of running workers, statsd, ...)
Plug standard log ingestion with elastic workers

[1] Draft note on https://hedgedoc.softwareheritage.org/4ZHT03kRT7mHYOEm1MSueQ

Related Objects
Search...

Status	Assigned	Task
Migrated	gitlab-migration	T4523 Dynamic infrastructure
Migrated	gitlab-migration	T4144 Elastic worker infrastructure
Migrated	gitlab-migration	T3592 POC elastic worker infrastructure

Event Timeline

vsellier triaged this task as Normal priority.Sep 21 2021, 8:29 AM

vsellier created this task.

vsellier updated the task description. (Show Details)Sep 22 2021, 12:03 PM

ardumont changed the task status from Open to Work in Progress.Sep 22 2021, 2:57 PM

ardumont moved this task from Backlog to Weekly backlog on the System administration board.

ardumont moved this task from Weekly backlog to in-progress on the System administration board.

ardumont updated the task description. (Show Details)Sep 22 2021, 3:24 PM

ardumont updated the task description. (Show Details)Sep 22 2021, 4:23 PM

ardumont mentioned this in rSPSITE9a8e1c42ffac: common: Exclude k3s container disk from icinga alerts.Sep 22 2021, 5:05 PM

Interesting documentations on how to manage jobs:

the different job pattenrs: https://kubernetes.io/docs/concepts/workloads/controllers/job/#job-patterns
Using controllers to manage the workload: https://kubernetes.io/docs/concepts/architecture/controller/

ardumont mentioned this in rSPREf261bd4d4831: Bootstrap a poc-rancher vm.Sep 23 2021, 3:22 PM

ardumont mentioned this in rSPRE986ddffb7e44: Install 2 vms in the "staging-workers" cluster.

bootstrap the poc-rancher vm
install docker.io (from debian) on it (technical requirements to run rancher [1])
install rancher with docker (it's a poc) [2]

root@poc-rancher:~# docker run -d --restart=unless-stopped -p 80:80 -p 443:443 --privileged --name rancher rancher/rancher
root@poc-rancher:~# docker ps rancher
...

this refuses to start.

trying out to install k3s first

root@poc-rancher:~# curl -sfL https://get.k3s.io | sh -
root@poc-rancher:~# systemctl status k3s | grep Active
     Active: active (running) since Wed 2021-09-22 14:03:13 UTC; 30s ago

root@poc-rancher:~# curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
root@poc-rancher:~# chmod 700
root@poc-rancher:~# chmod 700 get_helm.sh
root@poc-rancher:~# less get_helm.sh
root@poc-rancher:~# ./get_helm.sh
Downloading https://get.helm.sh/helm-v3.7.0-linux-amd64.tar.gz
Verifying checksum... Done.
Preparing to install helm into /usr/local/bin
root@poc-rancher:~# which helm
/usr/local/bin/helm
root@poc-rancher:~# helm repo add rancher-stable https://releases.rancher.com/server-charts/stable
"rancher-stable" has been added to your repositories
root@poc-rancher:~# helm repo list
NAME            URL
rancher-stable  https://releases.rancher.com/server-charts/stable
root@poc-rancher:~# kubectl create namespace cattle-system
namespace/cattle-system created
root@poc-rancher:~# helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "jetstack" chart repository
...Successfully got an update from the "rancher-stable" chart repository
Update Complete. ⎈Happy Helming!⎈
# Need some setup for the kube user (root here)
root@poc-rancher:~/.kube# cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
root@poc-rancher:~# kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.5.3/cert-manager.crds.yaml
customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io configured
customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io configured
customresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io configured
customresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io configured
customresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io configured
customresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io configured
root@poc-rancher:~# helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.5.3
NAME: cert-manager
LAST DEPLOYED: Wed Sep 22 14:19:59 2021
NAMESPACE: cert-manager
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
cert-manager v1.5.3 has been deployed successfully!

In order to begin issuing certificates, you will need to set up a ClusterIssuer
or Issuer resource (for example, by creating a 'letsencrypt-staging' issuer).

More information on the different types of issuers and how to configure them
can be found in our documentation:

https://cert-manager.io/docs/configuration/

For information on how to configure cert-manager to automatically provision
Certificates for Ingress resources, take a look at the `ingress-shim`
documentation:

https://cert-manager.io/docs/usage/ingress/
root@poc-rancher:~# kubectl get pods --namespace cert-manager
NAME                                       READY   STATUS    RESTARTS   AGE
cert-manager-cainjector-856d4df858-5xqcl   1/1     Running   0          6m57s
cert-manager-66b6d6bf59-59wqv              1/1     Running   0          6m57s
cert-manager-webhook-5fd7d458f7-kmqhq      1/1     Running   0          6m57s
root@poc-rancher:~# helm install rancher rancher-stable/rancher \
  --namespace cattle-system \
  --set hostname=poc-rancher.internal.staging.swh.network \
  --set bootstrapPassword=<redacted>
W0922 14:31:40.931662 48675 warnings.go:70] cert-manager.io/v1beta1 Issuer is deprecated
in v1.4+, unavailable in v1.6+; use cert-manager.io/v1 Issuer
NAME: rancher
LAST DEPLOYED: Wed Sep 22 14:31:40 2021
NAMESPACE: cattle-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Rancher Server has been installed.

NOTE: Rancher may take several minutes to fully initialize. Please standby while
Certificates are being issued and Ingress comes up.

Check out our docs at https://rancher.com/docs/rancher/v2.x/en/

Browse to https://poc-rancher.internal.staging.swh.network

Happy Containering!
root@poc-rancher:~# kubectl -n cattle-system rollout status deploy/rancher
Waiting for deployment "rancher" rollout to finish: 0 of 3 updated replicas are available...
Waiting for deployment "rancher" rollout to finish: 1 of 3 updated replicas are available...
Waiting for deployment "rancher" rollout to finish: 2 of 3 updated replicas are available...
Waiting for deployment spec update to be observed...
Waiting for deployment "rancher" rollout to finish: 2 of 3 updated replicas are available...
deployment "rancher" successfully rolled out

Browse: https://poc-rancher.internal.staging.swh.network/update-password (adapt password if need be)
https://poc-rancher.internal.staging.swh.network/g/clusters
Add cluster "staging-workers5" link and create a new cluster, follow the guide to actually opt-in/out of options
Now let's provision some nodes for the worker cluster to allow build/destroy dynamically containers
poc-ranger-sw{0,1} node provisioned + docker install (following the rancher->docker [3] documentation)
Then trigger the cluster registration on each node:

root@poc-rancher-sw{0,1):~# docker run -d --privileged --restart=unless-stopped --net=host -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run  rancher/rancher-agent:v2.5.9 --server https://poc-rancher.internal.staging.swh.network --token <redacted> --ca-checksum <redacted> --etcd --controlplane --worker

[1] https://rancher.com/docs/rancher/v2.6/en/

[2] https://rancher.com/docs/rancher/v2.6/en/quick-start-guide/deployment/quickstart-manual-setup/

[3] https://docs.docker.com/engine/install/debian/

ardumont updated the task description. (Show Details)Sep 23 2021, 3:35 PM

After having hard time, we have solved several issues:

The rancher initialization problem was because we were using a wrong version of k3s compared to the compatibility matrix of rancher.

We installed rancher 2.5.9 on a recent version of k3s installing kubernetes 1.22.2. According to the compatibility matrix of rancher[1], using a older version of k3s solved the problem and the clusters start correctly after that

After solving the rancher issue, we faced another issue with internode communication.
2 nodes on the cluster were unable to talk together. It's not really a problem for the workers as they don't need to communicate with other nodes in the cluster, but it often blocks the dns resolution on the pods because the dns resolvers are deployed with a daemonset and dispatched on several nodes [2]

A standalone k3s cluster also has the problem so it's not a rancher issue.

With 2 ubuntu vms, everything is working well so it's a compatibility issue with debian.

Output of the self check of k3s

root@poc-rancher-sw0:~# k3s check-config

Verifying binaries in /var/lib/rancher/k3s/data/bc7e12eeb86c005efd104d57a8733f0e9fbf2ede3571bad3da2cd0326b978aed/bin:
- sha256sum: good
- links: good

System:
- /usr/sbin iptables v1.8.2 (nf_tables): should be older than v1.8.0, newer than v1.8.3, or in legacy mode (fail)
- swap: should be disabled
- routes: ok

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

modprobe: FATAL: Module configs not found in directory /lib/modules/5.10.0-0.bpo.8-amd64
info: reading kernel config from /boot/config-5.10.0-0.bpo.8-amd64 ...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- /usr/sbin/apparmor_parser
apparmor: enabled and tools installed
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled (as module)
- CONFIG_BRIDGE: enabled (as module)
- CONFIG_BRIDGE_NETFILTER: enabled (as module)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled (as module)
- CONFIG_IP_NF_NAT: enabled (as module)
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_POSIX_MQUEUE: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_CGROUP_PIDS: enabled
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: enabled
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: enabled
- CONFIG_NET_CLS_CGROUP: enabled (as module)
- CONFIG_CGROUP_NET_PRIO: enabled
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: missing
- CONFIG_IP_NF_TARGET_REDIRECT: enabled (as module)
- CONFIG_IP_SET: enabled (as module)
- CONFIG_IP_VS: enabled (as module)
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_PROTO_TCP: enabled
- CONFIG_IP_VS_PROTO_UDP: enabled
- CONFIG_IP_VS_RR: enabled (as module)
- CONFIG_EXT4_FS: enabled (as module)
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled (as module)
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled (as module)
      - CONFIG_CRYPTO_GCM: enabled (as module)
      - CONFIG_CRYPTO_SEQIV: enabled (as module)
      - CONFIG_CRYPTO_GHASH: enabled (as module)
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled (as module)
      - CONFIG_XFRM_ALGO: enabled (as module)
      - CONFIG_INET_ESP: enabled (as module)
      - CONFIG_INET_XFRM_MODE_TRANSPORT: missing
- Storage Drivers:
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled (as module)

STATUS: 1 (fail)

It indicates an issue with the iptables version:

- /usr/sbin iptables v1.8.2 (nf_tables): should be older than v1.8.0, newer than v1.8.3, or in legacy mode (fail)

Switching to the old legacy version made him happy

update-alternatives --set iptables /usr/sbin/iptables-legacy
update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy

root@poc-rancher-sw0:~# k3s check-config
...
System:
- /usr/sbin iptables v1.8.2 (legacy): ok
...
STATUS: pass

Unfortunately, the nodes were still not able to communicate each others

A lot of network issues seems to be be related to the support vxlan on debian / flannel[3]
Trying to change the flannel backend to host-gw (on the k3s server only, not the agents) as indicated finally solve the problem and the node are able to communicate:

root@poc-rancher-sw0:/etc/systemd/system# diff -U3 /tmp/k3s.service k3s.service
--- /tmp/k3s.service	2021-09-29 08:39:37.628970457 +0000
+++ k3s.service	2021-09-29 08:39:46.249302142 +0000
@@ -28,4 +28,5 @@
 ExecStartPre=-/sbin/modprobe overlay
 ExecStart=/usr/local/bin/k3s \
     server \
+    --flannel-backend host-gw \

root@poc-rancher-sw0:~# ./test-network.sh 
=> Start network overlay test
poc-rancher-sw1 can reach poc-rancher-sw1
poc-rancher-sw1 can reach poc-rancher-sw0
poc-rancher-sw0 can reach poc-rancher-sw1
poc-rancher-sw0 can reach poc-rancher-sw0
=> End network overlay test

\o/

[1] https://rancher.com/support-maintenance-terms/all-supported-versions/rancher-v2.5.9/
[2] the communication test was done as explained on https://rancher.com/docs/rancher/v2.5/en/troubleshooting/networking/
[3] https://github.com/k3s-io/k3s/issues/3624 / https://github.com/k3s-io/k3s/issues/3863 / ...

2021-11-15 update* The rancher version 2.6.2 should include a fix for the network issue with debian but the tests are still not conclusive

ardumont mentioned this in rDSNIPea9cff913d55: Bootstrap helm chart.Sep 29 2021, 5:48 PM

ardumont mentioned this in rDSNIPca4664742724: Declare templatized config-map.

ardumont mentioned this in rDSNIP35bcef608dc0: helm: Install parametrized services.

ardumont mentioned this in rDSNIP3aa92c42bed6: helm: Install parametrized deployment for loaders.

ardumont mentioned this in rDSNIP6b321e64c1a4: helm: Mount container's /tmp as tmpfs mount.

ardumont mentioned this in rDSNIPe6f0510d00a3: helm: Try celery's solo policy instead of prefork.

Intermediary status:

We have successfully ran loaders in staging using the helm chart we have wrote [1] and an hardcoded number of worker, It adds the possibility to perform rolling upgrades for example
We have tried the integrated horizontal pod autoscaler [2], it works pretty well but it's not adapted for our worker scenario. It's based on the cpu consumption(on our test [3], but can be other things) of the pod to decide if the number of running pods must be upscaled or downscaled. It can be very useful to manage classical load like for gunicorn container, but not for the scenario of long running tasks
Kubernetes also has some functionalities to reduce the pressure on a node when some limts are reached but it looks like it's more emergency actions than proper scaling management. It's configured at the kubelet level and not dynamic at all [4]. It was rapidly tested but we have lost the node due to oom before the node eviction starts.

There are a lot of stuff we can also test like (no exhaustive):

trying to write an operator monitoring the overall cluster load and adapting the paralellism. the hard part is to find a way to identify the instances to stop
test third party tools like keda[5]

The log and monitoring part was not yet digged

[1] https://forge.softwareheritage.org/source/snippets/browse/master/sysadmin/T3592-elastic-workers/worker/
[2] https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/#autoscaling-on-multiple-metrics-and-custom-metrics
[3] https://forge.softwareheritage.org/source/snippets/browse/master/sysadmin/T3592-elastic-workers/worker/templates/autoscale.yaml
[4] https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/
[5] https://keda.sh/docs/2.4/concepts/scaling-jobs/#overview

keda looks promising. P1193 is an example of configuration working for the docker environment. It's able to scale to 0 when no messages are present on the queue.
When messages are present, the loaders are launched progressively until the limit of cpu/memory of the host is reached or the max number of allowed worker is reached.

ardumont updated the task description. (Show Details)Oct 8 2021, 3:41 PM

ardumont mentioned this in rDSNIPe2d6fc879bc2: config-map: Change pool policy to prefork and drop --events.Oct 8 2021, 4:01 PM

ardumont mentioned this in rDSNIP357946d60f8c: Autoscale workers depending on number of messages in queues.

ardumont mentioned this in T3640: Make long running task stop fast when warm shutdown is triggered.Oct 8 2021, 6:11 PM

just a quick remark about the scheduling of (sub)tasks of this task: IMHO the autoscaling should come last; all the supervision/monitoring/logging related tasks are much more important than the autoscaling.

It seems the rancher network issue is fixed in version 2.6.3 which is quite a good news

swhworker@poc-rancher:~$ ./test-network.sh 
=> Start network overlay test
poc-rancher-sw0 can reach poc-rancher-sw0
poc-rancher-sw0 can reach poc-rancher-sw1
poc-rancher-sw1 can reach poc-rancher-sw0
poc-rancher-sw1 can reach poc-rancher-sw1
=> End network overlay test

It works with debian 10 and debian 11 as soon the iptables-legacy is used as an alternative of the new iptables-nft command

bchauvet added projects: Roadmap 2022, meta-task.Mar 23 2022, 5:04 PM

bchauvet raised the priority of this task from Normal to High.Mar 25 2022, 5:25 PM

ardumont added a parent task: T4144: Elastic worker infrastructure.Apr 14 2022, 3:52 PM

ardumont removed a project: meta-task.

ardumont removed a project: Roadmap 2022.Apr 20 2022, 10:37 AM

Good news, it looks like there is no more issues with the inter-node communication with rancher 2.6.4 and bullseye.

The test[1] was tested OK on 2 different clusters (gitlab and the one for the elastic workers)

~/wip ❯ ./overlay.sh
=> Start network overlay test
elastic-worker1 can reach elastic-worker1
elastic-worker1 can reach elastic-worker0
elastic-worker1 can reach elastic-worker2
elastic-worker0 can reach elastic-worker1
elastic-worker0 can reach elastic-worker0
elastic-worker0 can reach elastic-worker2
elastic-worker2 can reach elastic-worker1
elastic-worker2 can reach elastic-worker0
elastic-worker2 can reach elastic-worker2
=> End network overlay test

[1] from https://rancher.com/docs/rancher/v2.5/en/troubleshooting/networking/

Let's close this.
Most of the remaining stuff will be dealt with in the main task.

ardumont mentioned this in T4144: Elastic worker infrastructure.Apr 26 2022, 11:30 AM

ardumont moved this task from in-progress to done on the System administration board.May 23 2022, 2:26 PM

This task has been migrated to GitLab.

POC elastic worker infrastructureClosed, MigratedEdits LockedActions

Description

Related ObjectsSearch...

Event Timeline

POC elastic worker infrastructure
Closed, MigratedEdits Locked
Actions

Related Objects
Search...