Page MenuHomeSoftware Heritage

[cassandra] Configure the monitoring of the cluster
Closed, MigratedEdits Locked

Description

A prometheus is needed to compare the different benchmarks.

In order to not impact the cassandra nodes, the etcd/controle plane nodes will received the worker role too.
Only the monitoring workload will be deployed on the nodes.

Too allow HA, the number of prometheus replicas will be set to 3 and the data location in the same directory on all the nodes.
The dataretention will be set to 365 to avoid data loss until the "thanosification" is done.

Event Timeline

vsellier renamed this task from Configure the monitoring of the cluster to [cassandra] Configure the monitoring of the cluster.Jul 11 2022, 4:48 PM
vsellier changed the task status from Open to Work in Progress.
vsellier triaged this task as Normal priority.
vsellier created this task.
vsellier moved this task from Backlog to in-progress on the System administration board.

Configure the data directory:

root@pergamon:~# clush -b -w @cassandra-mgmt hostname
---------------
rancher-node-cassandra1
---------------
rancher-node-cassandra1
---------------
rancher-node-cassandra2
---------------
rancher-node-cassandra2
---------------
rancher-node-cassandra3
---------------
rancher-node-cassandra3

root@pergamon:~# clush -b -w @cassandra-mgmt zfs set atime=off relatime=on data
root@pergamon:~# clush -b -w @cassandra-mgmt zfs create -o mountpoint=/srv/prometheus data/prometheus
vsellier moved this task from in-progress to done on the System administration board.

The mountpoint needs to be declare on the kubelet container to be reachable by the pods:

--- /tmp/cluster-orig.yaml	2022-07-12 11:27:27.169509573 +0200
+++ /tmp/cluster.yaml	2022-07-12 11:26:54.865395186 +0200
@@ -58,6 +58,8 @@
       service_node_port_range: 30000-32767
     kube-controller: {}
     kubelet:
+      extra_binds:
+        - '/srv/prometheus:/srv/prometheus'
       fail_swap_on: false
       generate_serving_certificate: false
     kubeproxy: {}

The detail of the configuration is on rSKCONF18f54485535514bb05a2840111922f80dcaec9da