Page MenuHomeSoftware Heritage
Feed Advanced Search

Sep 12 2022

vsellier committed R260:649738df8042: Measure performance for one-by-one directory replayer (authored by vsellier).
Measure performance for one-by-one directory replayer
Sep 12 2022, 3:39 PM
vsellier committed R260:58577860199f: Support specific options per replayer (authored by vsellier).
Support specific options per replayer
Sep 12 2022, 3:39 PM
vsellier committed R260:8c8dc14580fc: Merge remote-tracking branch 'origin/master' into cassandra (authored by vsellier).
Merge remote-tracking branch 'origin/master' into cassandra
Sep 12 2022, 3:17 PM
vsellier requested review of D8449: staging: Change the monitoring profile of db1 to sql.
Sep 12 2022, 3:09 PM

Sep 9 2022

vsellier added a comment to T4458: Test reaper to automate the cassandra repair actions.

reaper access the cassandra server through jmx. The cassandra deployment scripts need to be adapted (in progress) to expose jmx on the public interface.
When publicly exposed, the cassandra startup scripts force to password protect the jmx accesses.

Sep 9 2022, 4:41 PM · System administration
vsellier accepted D8437: archive-staging: Deploy pubdev ingestion stack.
Sep 9 2022, 2:06 PM
vsellier added a comment to T4458: Test reaper to automate the cassandra repair actions.

A new production node for replayers and generic load was added on the cluster to add more compute resources to allow testing the tool

Sep 9 2022, 11:51 AM · System administration
vsellier changed the status of T4458: Test reaper to automate the cassandra repair actions, a subtask of T4373: [cassandra] Test the new hardware, from Open to Work in Progress.
Sep 9 2022, 11:49 AM · Storage manager, System administration
vsellier changed the status of T4458: Test reaper to automate the cassandra repair actions from Open to Work in Progress.
Sep 9 2022, 11:49 AM · System administration
vsellier committed R260:7daee55ef56d: add an affinity of the replayers to the nodes with swh/replayer=true (authored by vsellier).
add an affinity of the replayers to the nodes with swh/replayer=true
Sep 9 2022, 11:46 AM
vsellier committed rSPRE943c46f87c61: rancher-production: Add a new node on hypervisor3 with 6 cpus (authored by vsellier).
rancher-production: Add a new node on hypervisor3 with 6 cpus
Sep 9 2022, 11:30 AM
vsellier added a comment to T4510: [cassandra] Profile the replayer cpu consumption.

here some profiling of a couple of replayers:

Sep 9 2022, 11:27 AM · Storage manager, System administration
vsellier committed R260:a45bdce2eb24: Merge remote-tracking branch 'origin/master' into cassandra (authored by vsellier).
Merge remote-tracking branch 'origin/master' into cassandra
Sep 9 2022, 10:15 AM
vsellier committed R260:f64a9a9829cd: Try to reduce the global cpu consumption to reduce the hypervisor load (authored by vsellier).
Try to reduce the global cpu consumption to reduce the hypervisor load
Sep 9 2022, 10:15 AM

Sep 8 2022

vsellier changed the status of T4510: [cassandra] Profile the replayer cpu consumption from Open to Work in Progress.
Sep 8 2022, 6:29 PM · Storage manager, System administration
vsellier changed the status of T4510: [cassandra] Profile the replayer cpu consumption, a subtask of T4373: [cassandra] Test the new hardware, from Open to Work in Progress.
Sep 8 2022, 6:29 PM · Storage manager, System administration
vsellier closed T4509: [swh-graph] Configure the max_memory to use, a subtask of T4507: Out of memory on granet, as Resolved.
Sep 8 2022, 6:25 PM · System administration, Compressed graph service
vsellier closed T4509: [swh-graph] Configure the max_memory to use as Resolved.

diff landed and deployed, graph restarted

Sep 8 2022, 6:25 PM · System administration, Compressed graph service
vsellier closed D8431: swh-graph: configure the max heap allocated to the java backend.
Sep 8 2022, 6:18 PM
vsellier committed rSPSITE303c48250b95: swh-graph: configure the max heap allocated to the java backend (authored by vsellier).
swh-graph: configure the max heap allocated to the java backend
Sep 8 2022, 6:18 PM
vsellier updated the diff for D8431: swh-graph: configure the max heap allocated to the java backend.

rebase

Sep 8 2022, 6:18 PM
vsellier added a comment to T4509: [swh-graph] Configure the max_memory to use.

I forgot to mention, it seems expect during some peak, the used memory is around 350Go

Sep 8 2022, 6:16 PM · System administration, Compressed graph service
vsellier triaged T4516: swh-graph: Add jvm monitoring as Normal priority.
Sep 8 2022, 5:57 PM · System administration, Compressed graph service
vsellier requested review of D8431: swh-graph: configure the max heap allocated to the java backend.
Sep 8 2022, 5:48 PM
vsellier added a revision to T4509: [swh-graph] Configure the max_memory to use: D8431: swh-graph: configure the max heap allocated to the java backend.
Sep 8 2022, 5:48 PM · System administration, Compressed graph service
vsellier accepted D8429: Add static check on the staging graphql instance.
Sep 8 2022, 5:30 PM
vsellier committed R260:7f552eb93182: reduce concurrent loaders for origin, increase directory (authored by vsellier).
reduce concurrent loaders for origin, increase directory
Sep 8 2022, 3:39 PM
vsellier added a comment to T4330: Deploy maven stack in production.

\o/ great

Sep 8 2022, 2:41 PM · System administration, Maven loader, Maven lister, GSoC 2019, Archive coverage
vsellier changed the status of T4509: [swh-graph] Configure the max_memory to use, a subtask of T4507: Out of memory on granet, from Open to Work in Progress.
Sep 8 2022, 12:22 PM · System administration, Compressed graph service
vsellier changed the status of T4509: [swh-graph] Configure the max_memory to use from Open to Work in Progress.
Sep 8 2022, 12:22 PM · System administration, Compressed graph service
vsellier closed T4471: swh-graph Add java process port monitoring as Resolved.
Sep 8 2022, 12:19 PM · Compressed graph service, System administration
vsellier committed R260:097e68bf3067: Merge remote-tracking branch 'origin/master' into cassandra (authored by vsellier).
Merge remote-tracking branch 'origin/master' into cassandra
Sep 8 2022, 11:18 AM
vsellier committed R260:25710a5f501e: Adapt replayer dispatching (authored by vsellier).
Adapt replayer dispatching
Sep 8 2022, 11:16 AM
vsellier closed D8415: thanos: Increase the allocated memory to avoid OOM killer.
Sep 8 2022, 11:08 AM
vsellier committed rSPREecec22a1a3a2: thanos: Increase the allocated memory to avoid OOM killer (authored by vsellier).
thanos: Increase the allocated memory to avoid OOM killer
Sep 8 2022, 11:08 AM
vsellier requested review of D8415: thanos: Increase the allocated memory to avoid OOM killer.
Sep 8 2022, 10:50 AM
vsellier triaged T4510: [cassandra] Profile the replayer cpu consumption as Normal priority.
Sep 8 2022, 10:38 AM · Storage manager, System administration
vsellier triaged T4509: [swh-graph] Configure the max_memory to use as High priority.
Sep 8 2022, 10:14 AM · System administration, Compressed graph service
vsellier added a comment to T4507: Out of memory on granet.

@vlorentz I assigned the task to you because if I'm not wrong you are running some experiments on granet.
I don't know what, but you should be more gentle with the server

Sep 8 2022, 9:40 AM · System administration, Compressed graph service
vsellier triaged T4507: Out of memory on granet as High priority.
Sep 8 2022, 9:38 AM · System administration, Compressed graph service
vsellier committed R260:a15da28adfaa: reduce origin comumption as most of the partition are replayed (authored by vsellier).
reduce origin comumption as most of the partition are replayed
Sep 8 2022, 9:12 AM
vsellier committed R260:4391318718a5: speed up origin topic replay (authored by vsellier).
speed up origin topic replay
Sep 8 2022, 6:42 AM

Sep 7 2022

vsellier updated the task description for T4506: Use local hypervisor storage in the loader pods.
Sep 7 2022, 6:21 PM · System administration
vsellier updated the task description for T4506: Use local hypervisor storage in the loader pods.
Sep 7 2022, 6:20 PM · System administration
vsellier triaged T4506: Use local hypervisor storage in the loader pods as High priority.
Sep 7 2022, 6:19 PM · System administration
vsellier committed R260:08c8caf5c557: Merge remote-tracking branch 'origin/master' into cassandra (authored by vsellier).
Merge remote-tracking branch 'origin/master' into cassandra
Sep 7 2022, 5:56 PM
vsellier accepted D8400: archive-staging: Deploy listers in cluster.
Sep 7 2022, 5:54 PM
vsellier committed R260:a2711ca2b6b1: stabilize the number of replayer (authored by vsellier).
stabilize the number of replayer
Sep 7 2022, 12:46 PM
vsellier committed R260:c6f5cb002929: fix comment start (authored by vsellier).
fix comment start
Sep 7 2022, 12:45 PM
vsellier committed R260:4b88963f3360: remove empty deployment as 0 is considered as empty values (authored by vsellier).
remove empty deployment as 0 is considered as empty values
Sep 7 2022, 12:44 PM
vsellier committed R260:fb253d42370a: try to fix 0 replicas deployments (authored by vsellier).
try to fix 0 replicas deployments
Sep 7 2022, 10:53 AM
vsellier committed R260:89c0b457ad2f: prioritize small topics to free resources for bigger ones later (authored by vsellier).
prioritize small topics to free resources for bigger ones later
Sep 7 2022, 10:39 AM

Sep 6 2022

vsellier updated the task description for T4479: uncouple the java grpc server from the python HTTP server.
Sep 6 2022, 5:34 PM · Compressed graph service
vsellier closed T4472: swh-graph: Allow to specify the rpc port as Wontfix.

yes even better

Sep 6 2022, 5:34 PM · Compressed graph service
vsellier closed D8405: money: fix chromium issue with missing sse3 instructions.
Sep 6 2022, 5:29 PM
vsellier committed rSPREc4eec83f84e9: money: fix chromium issue with missing sse3 instructions (authored by vsellier).
money: fix chromium issue with missing sse3 instructions
Sep 6 2022, 5:29 PM
vsellier requested changes to D8400: archive-staging: Deploy listers in cluster.
Sep 6 2022, 5:27 PM
vsellier requested review of D8405: money: fix chromium issue with missing sse3 instructions.
Sep 6 2022, 5:22 PM
vsellier accepted D8397: Deploy maven-exporter production node.
Sep 6 2022, 4:13 PM
vsellier added inline comments to D8397: Deploy maven-exporter production node.
Sep 6 2022, 3:50 PM
vsellier added a comment to T4497: [sentry] Out of disk space.

The root cause is a swh-graph experiment that generated a lot of grpc errors which are huge.

Sep 6 2022, 12:41 PM · Sentry, System administration
vsellier added a comment to T4497: [sentry] Out of disk space.

No consumers seem to have a big lag on these topics, so it should be possible to reduce the lag to unblock the server and have a look which service is sending the events:

root@riverside:/var/lib/sentry-onpremise# docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --list | tr -d '\r' | xargs -t -n1 docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092  --describe --group | grep -e GROUP -e " events "
Creating sentry-self-hosted_kafka_run ... done
docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group snuba-consumers
Creating sentry-self-hosted_kafka_run ... done
GROUP           TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                  HOST            CLIENT-ID
snuba-consumers events          0          82585390        82587094        1704            -                                            -               -
docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group snuba-post-processor:sync:6fa9928e1d6911edac290242ac170014
Creating sentry-self-hosted_kafka_run ... done
GROUP                                                      TOPIC            PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                  HOST            CLIENT-ID
docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group ingest-consumer
Creating sentry-self-hosted_kafka_run ... done
Sep 6 2022, 11:18 AM · Sentry, System administration
vsellier added a comment to T4497: [sentry] Out of disk space.

The biggest topics are:

root@riverside:/var/lib/docker/volumes/sentry-kafka/_data# du -sch * | sort -h | tail -n 5
31M	snuba-commit-log-0
291M	outcomes-0
30G	ingest-events-0
43G	events-0
73G	total
Sep 6 2022, 11:11 AM · Sentry, System administration
vsellier changed the status of T4497: [sentry] Out of disk space from Open to Work in Progress.
Sep 6 2022, 11:09 AM · Sentry, System administration
vsellier committed R260:9b4840fdba24: Reduce cpu and memory reservation for a couple of topics when the lag recovered (authored by vsellier).
Reduce cpu and memory reservation for a couple of topics when the lag recovered
Sep 6 2022, 9:51 AM

Sep 5 2022

vsellier committed R260:c2f69df4a306: adapt cpu request for snapshots (authored by vsellier).
adapt cpu request for snapshots
Sep 5 2022, 5:54 PM
vsellier committed R260:69f2df7d9a89: Adjust memory and cpu request according real behavior (authored by vsellier).
Adjust memory and cpu request according real behavior
Sep 5 2022, 5:41 PM
vsellier committed R260:83f425675775: Merge branch 'master' into cassandra (authored by vsellier).
Merge branch 'master' into cassandra
Sep 5 2022, 5:21 PM
vsellier updated the task description for T4474: onboarding Lunar.
Sep 5 2022, 5:19 PM · System administration
vsellier committed R260:59bd29f9a880: wip - Configure the replayers (authored by vsellier).
wip - Configure the replayers
Sep 5 2022, 4:53 PM
vsellier updated the task description for T4474: onboarding Lunar.
Sep 5 2022, 4:40 PM · System administration
vsellier updated the task description for T4474: onboarding Lunar.
Sep 5 2022, 3:53 PM · System administration
vsellier closed D8391: Declare lunar credentials.
Sep 5 2022, 3:52 PM
vsellier committed rSPSITEdbf46fbbc84a: Declare lunar credentials (authored by vsellier).
Declare lunar credentials
Sep 5 2022, 3:52 PM
vsellier committed rSPPRIVC477532a3dc5f: add password for lunar (authored by vsellier).
add password for lunar
Sep 5 2022, 3:50 PM
vsellier requested review of D8391: Declare lunar credentials.
Sep 5 2022, 3:49 PM
vsellier added a revision to T4474: onboarding Lunar: D8391: Declare lunar credentials.
Sep 5 2022, 3:49 PM · System administration
vsellier committed rSPRE445014116634: Declare the archive-production k8s cluster (authored by vsellier).
Declare the archive-production k8s cluster
Sep 5 2022, 3:11 PM

Sep 4 2022

vsellier committed rSKCONF1ad68a77cc9a: Deploy the cassandra replayers on the archive production cluster (authored by vsellier).
Deploy the cassandra replayers on the archive production cluster
Sep 4 2022, 7:00 PM

Sep 2 2022

vsellier committed rSKCONF26d8c509c2e8: Declare archive-production cluster applications (authored by vsellier).
Declare archive-production cluster applications
Sep 2 2022, 5:51 PM
vsellier committed rSKCONF3ebfe35fde48: Delete the production-cassandra cluster configuration (authored by vsellier).
Delete the production-cassandra cluster configuration
Sep 2 2022, 5:51 PM
vsellier accepted D8385: Expose thanos gateway service to read archive staging metrics.
Sep 2 2022, 4:05 PM
vsellier closed D8380: Declare the missing read-only objstorage dns entry.
Sep 2 2022, 10:14 AM
vsellier committed rSPSITE8679da780d99: Declare the missing read-only objstorage dns entry (authored by vsellier).
Declare the missing read-only objstorage dns entry
Sep 2 2022, 10:14 AM
vsellier accepted D8381: cluster-archive-staging: Activate thanos sidecar service.
Sep 2 2022, 9:40 AM
vsellier requested review of D8380: Declare the missing read-only objstorage dns entry.
Sep 2 2022, 9:15 AM

Sep 1 2022

vsellier committed rSPSITE9966a31e4f17: fix hiera file names for hosts (authored by vsellier).
fix hiera file names for hosts
Sep 1 2022, 10:06 PM
vsellier committed rSPSITE81197e23e261: Remove local dns proxy of production rancher nodes (authored by vsellier).
Remove local dns proxy of production rancher nodes
Sep 1 2022, 9:54 PM

Aug 31 2022

vsellier committed rSPSITE33cd4db917f6: staging: Reduce the worker restart overhead (authored by vsellier).
staging: Reduce the worker restart overhead
Aug 31 2022, 6:39 PM
vsellier closed D8371: staging: Increase the number of workers for storage and indexer storage.
Aug 31 2022, 6:39 PM
vsellier committed rSPSITEceb1d7760989: staging: Increase the number of workers for storage and indexer storage (authored by vsellier).
staging: Increase the number of workers for storage and indexer storage
Aug 31 2022, 6:39 PM
vsellier updated the diff for D8371: staging: Increase the number of workers for storage and indexer storage.

rebase

Aug 31 2022, 6:38 PM
vsellier updated the test plan for D8371: staging: Increase the number of workers for storage and indexer storage.
Aug 31 2022, 6:34 PM
vsellier updated the diff for D8371: staging: Increase the number of workers for storage and indexer storage.

Increase the number of request handled by a storage worker before it's restarted

Aug 31 2022, 6:33 PM
vsellier accepted D8370: staging intrinsic metadata indexer: Declare batch size to 100.
Aug 31 2022, 6:20 PM
vsellier requested review of D8371: staging: Increase the number of workers for storage and indexer storage.
Aug 31 2022, 6:19 PM
vsellier added a revision to T4477: staging origin intrinsic metadata indexer are stuck: D8371: staging: Increase the number of workers for storage and indexer storage.
Aug 31 2022, 6:19 PM · Indexer, System administration
vsellier accepted D8352: staging/graphql: Push issues to sentry.
Aug 31 2022, 4:54 PM
vsellier closed D8366: terraform: centralize the production bridge configuration.
Aug 31 2022, 4:46 PM
vsellier committed rSPRE1614bb4f4b5b: terraform: centralize the production bridge configuration (authored by vsellier).
terraform: centralize the production bridge configuration
Aug 31 2022, 4:46 PM