rancher seems to create emptydir volume in /var/lib/kubelet, except the /var/lib/kubelet/pki directory, everything is ephemeral in this directory so we could easily use a partition backed by a local storage disk.
It will also remove an unecessary pressure on ceph for the pod relative data.
The /var/lib/docker directory could also be moved to this local partition as everything in docker can be lost.
I will manually try that on one staging node to check if it can work before changing the terraform / puppet code
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Sep 14 2022
rebase
In order to test the local storage on nodes declared on uffizi, I configured a new scratch storage on this hypervisor.
Following T3707#73522 and https://pve.proxmox.com/wiki/Storage:_LVM_Thin
root@uffizi:~# lvcreate -L200G -n proxmox-scratch vg-louvre Logical volume "scratch" created.
I close this issue because after the @vlorentz 's analysis it seems there isn't a lot of things to improve
Sep 13 2022
These are the results of the different algorithms tests for the directory_add (with 20 directory replayers)
- one-by-one
postgres=# select count(*) from pg_stat_activity where query like '%UNNEST(%'; count ------- 64 (1 row)
postgres=# select count(*) from pg_stat_activity where query like '%UNNEST(%'; count ------- 64 (1 row)
Sep 12 2022
All the indexers were stopped at 20:00 FR because something was consummng all the bandwidth of the VPN between azure and the our infra.
root@pergamon:/etc/clustershell# clush -b -w @indexer-workers "puppet agent --disable 'stop indexer to avoid bandwith consumption'" root@pergamon:/etc/clustershell# clush -b -w @indexer-workers "systemctl stop swh-indexer-journal-client@*"
Sep 9 2022
reaper access the cassandra server through jmx. The cassandra deployment scripts need to be adapted (in progress) to expose jmx on the public interface.
When publicly exposed, the cassandra startup scripts force to password protect the jmx accesses.
A new production node for replayers and generic load was added on the cluster to add more compute resources to allow testing the tool
here some profiling of a couple of replayers:
Sep 8 2022
diff landed and deployed, graph restarted
rebase
\o/ great
@vlorentz I assigned the task to you because if I'm not wrong you are running some experiments on granet.
I don't know what, but you should be more gentle with the server
Sep 7 2022
Sep 6 2022
yes even better
The root cause is a swh-graph experiment that generated a lot of grpc errors which are huge.
No consumers seem to have a big lag on these topics, so it should be possible to reduce the lag to unblock the server and have a look which service is sending the events:
root@riverside:/var/lib/sentry-onpremise# docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --list | tr -d '\r' | xargs -t -n1 docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group | grep -e GROUP -e " events " Creating sentry-self-hosted_kafka_run ... done docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group snuba-consumers Creating sentry-self-hosted_kafka_run ... done GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID snuba-consumers events 0 82585390 82587094 1704 - - - docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group snuba-post-processor:sync:6fa9928e1d6911edac290242ac170014 Creating sentry-self-hosted_kafka_run ... done GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group ingest-consumer Creating sentry-self-hosted_kafka_run ... done
The biggest topics are:
root@riverside:/var/lib/docker/volumes/sentry-kafka/_data# du -sch * | sort -h | tail -n 5 31M snuba-commit-log-0 291M outcomes-0 30G ingest-events-0 43G events-0 73G total