- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Sep 20 2022
Sep 19 2022
Reduce queueThreshold to ensure running at max loaders when needed
update commit message
Sep 18 2022
Sep 17 2022
Sep 16 2022
For example, thes logs were saw in the reaper logs when several repairs for different keystores are scheduler at the same time:
│ INFO [2022-09-16 15:02:34,268] [archive_production:67b1d310-35c5-11ed-8ea7-4b43418aeab2:67b9e963-35c5-11ed-8ea7-4b43418aeab2] i.c.s.SegmentRunner - Repair for segment 67b9e963-35c5-11ed-8ea7-4b43418aeab2 started, status wait will timeout in 1800000 millis │ │ INFO [2022-09-16 15:02:58,602] [archive_production:9a773740-35cf-11ed-8ea7-4b43418aeab2] i.c.s.RepairRunner - Maximum number of concurrent repairs reached. Repair 9a773740-35cf-11ed-8ea7-4b43418aeab2 will resume later. │ │ INFO [2022-09-16 15:02:58,602] [archive_production:9a773740-35cf-11ed-8ea7-4b43418aeab2] i.c.s.RepairRunner - Current active repair runners: [(67b1d310-35c5-11ed-8ea7-4b43418aeab2,1663335748289), (76982be0-35cf-11ed-8ea7-4b43418aeab2,1663340068254), (9a773740-35cf-11ed-8ea7-4b43418aeab2,1663340128436), (9a9cc0a0-35cf-1 │ │ INFO [2022-09-16 15:02:58,787] [archive_production:9a9cc0a0-35cf-11ed-8ea7-4b43418aeab2] i.c.s.RepairRunner - Maximum number of concurrent repairs reached. Repair 9a9cc0a0-35cf-11ed-8ea7-4b43418aeab2 will resume later. │ │ INFO [2022-09-16 15:02:58,787] [archive_production:9a9cc0a0-35cf-11ed-8ea7-4b43418aeab2] i.c.s.RepairRunner - Current active repair runners: [(67b1d310-35c5-11ed-8ea7-4b43418aeab2,1663335748289), (76982be0-35cf-11ed-8ea7-4b43418aeab2,1663340068254), (9a773740-35cf-11ed-8ea7-4b43418aeab2,1663340128436), (9a9cc0a0-35cf-1 │ │ INFO [2022-09-16 15:02:59,336] [archive_production:9aeae0a0-35cf-11ed-8ea7-4b43418aeab2] i.c.s.RepairRunner - Maximum number of concurrent repairs reached. Repair 9aeae0a0-35cf-11ed-8ea7-4b43418aeab2 will resume later. │ │ INFO [2022-09-16 15:02:59,336] [archive_production:9aeae0a0-35cf-11ed-8ea7-4b43418aeab2] i.c.s.RepairRunner - Current active repair runners: [(67b1d310-35c5-11ed-8ea7-4b43418aeab2,1663335748289), (76982be0-35cf-11ed-8ea7-4b43418aeab2,1663340068254), (9a773740-35cf-11ed-8ea7-4b43418aeab2,1663340128436), (9a9cc0a0-35cf-1 │ │ INFO [2022-09-16 15:02:59,555] [archive_production:9b0c7260-35cf-11ed-8ea7-4b43418aeab2] i.c.s.RepairRunner - Maximum number of concurrent repairs reached. Repair 9b0c7260-35cf-11ed-8ea7-4b43418aeab2 will resume later. │ │ INFO [2022-09-16 15:02:59,555] [archive_production:9b0c7260-35cf-11ed-8ea7-4b43418aeab2] i.c.s.RepairRunner - Current active repair runners: [(67b1d310-35c5-11ed-8ea7-4b43418aeab2,1663335748289), (76982be0-35cf-11ed-8ea7-4b43418aeab2,1663340068254), (9a773740-35cf-11ed-8ea7-4b43418aeab2,1663340128436), (9a9cc0a0-35cf-1 │ │ INFO [2022-09-16 15:02:59,779] [archive_production:76982be0-35cf-11ed-8ea7-4b43418aeab2] i.c.s.RepairRunner - Attempting to run new segment... │ │ INFO [2022-09-16 15:02:59,813] [archive_production:76982be0-35cf-11ed-8ea7-4b43418aeab2] i.c.s.RepairRunner - Next segment to run : 76998b71-35cf-11ed-8ea7-4b43418aeab2 │ │ INFO [2022-09-16 15:02:59,849] [archive_production:76982be0-35cf-11ed-8ea7-4b43418aeab2:76998b71-35cf-11ed-8ea7-4b43418aeab2] i.c.j.JmxProxy - Triggering repair of range (-5797115047693728403,-5671075333212739092] for keyspace "reaper_db" on host 192.168.100.182, with repair parallelism dc_parallel, in cluster with Cas │ │ INFO [2022-09-16 15:02:59,851] [archive_production:76982be0-35cf-11ed-8ea7-4b43418aeab2:76998b71-35cf-11ed-8ea7-4b43418aeab2] i.c.j.JmxProxy - Triggering repair for ranges -5797115047693728403:-5671075333212739092 │ │ INFO [2022-09-16 15:02:59,863] [archive_production:76982be0-35cf-11ed-8ea7-4b43418aeab2:76998b71-35cf-11ed-8ea7-4b43418aeab2] i.c.s.RepairRunner - Triggered repair of segment 76998b71-35cf-11ed-8ea7-4b43418aeab2 via host 192.168.100.182 │ │ INFO [2022-09-16 15:02:59,863] [archive_production:76982be0-35cf-11ed-8ea7-4b43418aeab2:76998b71-35cf-11ed-8ea7-4b43418aeab2] i.c.s.SegmentRunner - Repair for segment 76998b71-35cf-11ed-8ea7-4b43418aeab2 started, status wait will timeout in 1800000 millis │ │ INFO [2022-09-16 15:03:04,227] [archive_production:67b1d310-35c5-11ed-8ea7-4b43418aeab2] i.c.s.RepairRunner - Attempting to run new segment... │ │ INFO [2022-09-16 15:03:04,254] [archive_production:67b1d310-35c5-11ed-8ea7-4b43418aeab2] i.c.s.RepairRunner - All nodes are busy or have too many pending compactions for the remaining candidate segments. │ │ INFO [2022-09-16 15:03:04,262] [archive_production:67b1d310-35c5-11ed-8ea7-4b43418aeab2] i.c.s.RepairRunner - All nodes are busy or have too many pending compactions for the remaining candidate segments.
perhaps a little concern regarding the length of the group_id but nothing blocking
Reaper was manually deployed and running.
The main functionnalities for now are the scheduling of the different repair type, the orchestration of the segment to repair to avoid a repair of the same segment in different replicas.
Secondary functionalities can be useful too like the repair progress, stop / resume http://cassandra-reaper.io/docs/concepts/
Sep 15 2022
The group id of the authenticated consumers have to be probably updated to match the kafka acls
rebase
Example during the loading of https://github.com/torvalds/linux by a pod:
% /usr/sbin/zfs list data/docker data/kubelet NAME USED AVAIL REFER MOUNTPOINT data/docker 3.81G 40.4G 83.2M /var/lib/docker data/kubelet 3.71G 40.4G 3.71G /var/lib/kubelet
The compression is not as useful as for docker
% /usr/sbin/zfs get compressratio data/kubelet data/docker NAME PROPERTY VALUE SOURCE data/docker compressratio 2.95x - data/kubelet compressratio 1.07x -
rebase
heh sorry for the title mess
The kubelet dataset will need to be manually created on all the rancher nodes (except staging worker2 and worker3 already configured) before applying D8482
- cluster-argo
- archive-staging
- archive-production
- rebase
- fix a typo on the jmxremote.access file name
- configure the jvm to use it
Sep 14 2022
It works \o/
rancher seems to create emptydir volume in /var/lib/kubelet, except the /var/lib/kubelet/pki directory, everything is ephemeral in this directory so we could easily use a partition backed by a local storage disk.
It will also remove an unecessary pressure on ceph for the pod relative data.
The /var/lib/docker directory could also be moved to this local partition as everything in docker can be lost.
I will manually try that on one staging node to check if it can work before changing the terraform / puppet code
rebase
In order to test the local storage on nodes declared on uffizi, I configured a new scratch storage on this hypervisor.
Following T3707#73522 and https://pve.proxmox.com/wiki/Storage:_LVM_Thin
root@uffizi:~# lvcreate -L200G -n proxmox-scratch vg-louvre Logical volume "scratch" created.
I close this issue because after the @vlorentz 's analysis it seems there isn't a lot of things to improve
Sep 13 2022
These are the results of the different algorithms tests for the directory_add (with 20 directory replayers)
- one-by-one
postgres=# select count(*) from pg_stat_activity where query like '%UNNEST(%'; count ------- 64 (1 row)
postgres=# select count(*) from pg_stat_activity where query like '%UNNEST(%'; count ------- 64 (1 row)
Sep 12 2022
All the indexers were stopped at 20:00 FR because something was consummng all the bandwidth of the VPN between azure and the our infra.
root@pergamon:/etc/clustershell# clush -b -w @indexer-workers "puppet agent --disable 'stop indexer to avoid bandwith consumption'" root@pergamon:/etc/clustershell# clush -b -w @indexer-workers "systemctl stop swh-indexer-journal-client@*"