Sentry

Closing as there is no alerts since almost one month

The root cause is a swh-graph experiment that generated a lot of grpc errors which are huge.

No consumers seem to have a big lag on these topics, so it should be possible to reduce the lag to unblock the server and have a look which service is sending the events:

root@riverside:/var/lib/sentry-onpremise# docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --list | tr -d '\r' | xargs -t -n1 docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092  --describe --group | grep -e GROUP -e " events "
Creating sentry-self-hosted_kafka_run ... done
docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group snuba-consumers
Creating sentry-self-hosted_kafka_run ... done
GROUP           TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                  HOST            CLIENT-ID
snuba-consumers events          0          82585390        82587094        1704            -                                            -               -
docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group snuba-post-processor:sync:6fa9928e1d6911edac290242ac170014
Creating sentry-self-hosted_kafka_run ... done
GROUP                                                      TOPIC            PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                  HOST            CLIENT-ID
docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group ingest-consumer
Creating sentry-self-hosted_kafka_run ... done

The biggest topics are:

root@riverside:/var/lib/docker/volumes/sentry-kafka/_data# du -sch * | sort -h | tail -n 5
31M	snuba-commit-log-0
291M	outcomes-0
30G	ingest-events-0
43G	events-0
73G	total

It seems everything is already ok (another cooking task issue reported [1]) in the end so closing this.

I don't know how to trigger an error in the vault, currently; you'd have to change the code to manually do that :|

Also that's a cooker worker issue from sentry from 18h ago or so (as of the moment of this comment).

I'll let you trigger some cooking and reports here your finding.

Something is bothering me, isn't there some catchall exceptions happening in the vault source code somewhere already?

I wonder if replacing @worker_init.connect with @worker_process_init.connect at https://forge.softwareheritage.org/source/swh-scheduler/browse/master/swh/scheduler/celery_backend/config.py$157 would work.

so I'm guessing Celery is eating logs somehow, so Sentry doesn't see them

So i don't currently know what's wrong (if anything is).

So i was wrong, it is correctly set [1].
And there are sentry issues about workers [2].

Yes, i confirm (from #swh-sysadm discussion)

Looking at puppet configuration, my guess is that the sentry_dsn is not set for the vault cookers.

In the mean time, logs can be reached in the dedicated dashboard

SentryFolder
ActivePublic
Watch Project

Members

Watchers

Recent Activity
View All

Jan 8 2023

Oct 19 2022

Oct 4 2022

Sep 15 2022

Sep 6 2022

Aug 24 2022

Aug 17 2022

Aug 9 2022

Feb 9 2022

Feb 8 2022

Oct 15 2021

Sep 29 2021

SentryFolderActivePublicWatch Project