Jan 8 2023
Oct 19 2022
Oct 4 2022
Closing as there is no alerts since almost one month
Sep 15 2022
Sep 6 2022
The root cause is a swh-graph experiment that generated a lot of grpc errors which are huge.
No consumers seem to have a big lag on these topics, so it should be possible to reduce the lag to unblock the server and have a look which service is sending the events:
root@riverside:/var/lib/sentry-onpremise# docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --list | tr -d '\r' | xargs -t -n1 docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group | grep -e GROUP -e " events " Creating sentry-self-hosted_kafka_run ... done docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group snuba-consumers Creating sentry-self-hosted_kafka_run ... done GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID snuba-consumers events 0 82585390 82587094 1704 - - - docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group snuba-post-processor:sync:6fa9928e1d6911edac290242ac170014 Creating sentry-self-hosted_kafka_run ... done GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group ingest-consumer Creating sentry-self-hosted_kafka_run ... done
The biggest topics are:
root@riverside:/var/lib/docker/volumes/sentry-kafka/_data# du -sch * | sort -h | tail -n 5 31M snuba-commit-log-0 291M outcomes-0 30G ingest-events-0 43G events-0 73G total
Aug 24 2022
Aug 17 2022
Aug 9 2022
Feb 9 2022
It seems everything is already ok (another cooking task issue reported [1]) in the end so closing this.
I don't know how to trigger an error in the vault, currently; you'd have to change the code to manually do that :|
Also that's a cooker worker issue from sentry from 18h ago or so (as of the moment of this comment).
I'll let you trigger some cooking and reports here your finding.
Something is bothering me, isn't there some catchall exceptions happening in the vault source code somewhere already?
Feb 8 2022
I wonder if replacing @worker_init.connect with @worker_process_init.connect at https://forge.softwareheritage.org/source/swh-scheduler/browse/master/swh/scheduler/celery_backend/config.py$157 would work.
so I'm guessing Celery is eating logs somehow, so Sentry doesn't see them
So i don't currently know what's wrong (if anything is).
So i was wrong, it is correctly set [1].
And there are sentry issues about workers [2].
Yes, i confirm (from #swh-sysadm discussion)
Looking at puppet configuration, my guess is that the sentry_dsn is not set for the vault cookers.
Oct 15 2021
Sep 29 2021
In the mean time, logs can be reached in the dedicated dashboard