Page MenuHomeSoftware Heritage

SentryFolder
ActivePublic

Members

  • This project does not have any members.
  • View All

Watchers

  • This project does not have any watchers.
  • View All

Recent Activity

Tue, Oct 4

vsellier closed T4497: [sentry] Out of disk space as Resolved.

Closing as there is no alerts since almost one month

Tue, Oct 4, 6:15 PM · Sentry, System administration

Thu, Sep 15

vlorentz closed T4427: Deal with recurring Sentry reports of temporary server shutdowns as Resolved.
Thu, Sep 15, 1:56 PM · Core & foundations, Sentry

Sep 6 2022

vlorentz added revisions to T4497: [sentry] Out of disk space: D8402: conftest: Refactor GraphServerProcess to be more flexible, D8403: Kill GraphServerProcess on test teardown, D8404: Return HTTP 503 on AioRpcError.
Sep 6 2022, 5:31 PM · Sentry, System administration
vsellier added a comment to T4497: [sentry] Out of disk space.

The root cause is a swh-graph experiment that generated a lot of grpc errors which are huge.

Sep 6 2022, 12:41 PM · Sentry, System administration
vsellier added a comment to T4497: [sentry] Out of disk space.

No consumers seem to have a big lag on these topics, so it should be possible to reduce the lag to unblock the server and have a look which service is sending the events:

root@riverside:/var/lib/sentry-onpremise# docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --list | tr -d '\r' | xargs -t -n1 docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092  --describe --group | grep -e GROUP -e " events "
Creating sentry-self-hosted_kafka_run ... done
docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group snuba-consumers
Creating sentry-self-hosted_kafka_run ... done
GROUP           TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                  HOST            CLIENT-ID
snuba-consumers events          0          82585390        82587094        1704            -                                            -               -
docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group snuba-post-processor:sync:6fa9928e1d6911edac290242ac170014
Creating sentry-self-hosted_kafka_run ... done
GROUP                                                      TOPIC            PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                  HOST            CLIENT-ID
docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group ingest-consumer
Creating sentry-self-hosted_kafka_run ... done
Sep 6 2022, 11:18 AM · Sentry, System administration
vsellier added a comment to T4497: [sentry] Out of disk space.

The biggest topics are:

root@riverside:/var/lib/docker/volumes/sentry-kafka/_data# du -sch * | sort -h | tail -n 5
31M	snuba-commit-log-0
291M	outcomes-0
30G	ingest-events-0
43G	events-0
73G	total
Sep 6 2022, 11:11 AM · Sentry, System administration
vsellier changed the status of T4497: [sentry] Out of disk space from Open to Work in Progress.
Sep 6 2022, 11:09 AM · Sentry, System administration

Aug 24 2022

vlorentz removed a subtask for T4427: Deal with recurring Sentry reports of temporary server shutdowns: Restricted Maniphest Task.
Aug 24 2022, 12:35 PM · Core & foundations, Sentry

Aug 17 2022

vlorentz added a subtask for T4427: Deal with recurring Sentry reports of temporary server shutdowns: Restricted Maniphest Task.
Aug 17 2022, 3:50 PM · Core & foundations, Sentry

Aug 9 2022

vlorentz added revisions to T4427: Deal with recurring Sentry reports of temporary server shutdowns: D8223: Convert psycopg2 errors to TransientRemoteException instead of RemoteException, D8224: retry: Add constant 10s wait when retrying transient exceptions.
Aug 9 2022, 4:20 PM · Core & foundations, Sentry
vlorentz added revisions to T4427: Deal with recurring Sentry reports of temporary server shutdowns: D8219: Remove support for deprecated exception format, D8220: Add tests for RPC server's exception handling, D8221: Make the RPC client raise a specific exception class on 503.
Aug 9 2022, 11:18 AM · Core & foundations, Sentry
vlorentz claimed T4427: Deal with recurring Sentry reports of temporary server shutdowns.
Aug 9 2022, 11:17 AM · Core & foundations, Sentry
vlorentz triaged T4427: Deal with recurring Sentry reports of temporary server shutdowns as Normal priority.
Aug 9 2022, 11:17 AM · Core & foundations, Sentry

Feb 9 2022

ardumont changed the status of T3920: Errors from vault workers are not logged to Sentry from Invalid to Resolved.
Feb 9 2022, 3:03 PM · System administration, Vault, Sentry
ardumont closed T3920: Errors from vault workers are not logged to Sentry as Invalid.

It seems everything is already ok (another cooking task issue reported [1]) in the end so closing this.

Feb 9 2022, 3:03 PM · System administration, Vault, Sentry
ardumont added a comment to T3920: Errors from vault workers are not logged to Sentry.

I don't know how to trigger an error in the vault, currently; you'd have to change the code to manually do that :|

Feb 9 2022, 10:24 AM · System administration, Vault, Sentry
ardumont added a comment to T3920: Errors from vault workers are not logged to Sentry.

Also that's a cooker worker issue from sentry from 18h ago or so (as of the moment of this comment).

Feb 9 2022, 10:03 AM · System administration, Vault, Sentry
vlorentz added a comment to T3920: Errors from vault workers are not logged to Sentry.

I'll let you trigger some cooking and reports here your finding.

Feb 9 2022, 10:02 AM · System administration, Vault, Sentry
ardumont added a comment to T3920: Errors from vault workers are not logged to Sentry.

Something is bothering me, isn't there some catchall exceptions happening in the vault source code somewhere already?

Feb 9 2022, 9:51 AM · System administration, Vault, Sentry

Feb 8 2022

vlorentz added a comment to T3920: Errors from vault workers are not logged to Sentry.

I wonder if replacing @worker_init.connect with @worker_process_init.connect at https://forge.softwareheritage.org/source/swh-scheduler/browse/master/swh/scheduler/celery_backend/config.py$157 would work.

Feb 8 2022, 6:43 PM · System administration, Vault, Sentry
vlorentz added a comment to T3920: Errors from vault workers are not logged to Sentry.

so I'm guessing Celery is eating logs somehow, so Sentry doesn't see them

Feb 8 2022, 6:37 PM · System administration, Vault, Sentry
ardumont added a comment to T3920: Errors from vault workers are not logged to Sentry.

So i don't currently know what's wrong (if anything is).

Feb 8 2022, 6:17 PM · System administration, Vault, Sentry
ardumont added a comment to T3920: Errors from vault workers are not logged to Sentry.

So i was wrong, it is correctly set [1].
And there are sentry issues about workers [2].

Feb 8 2022, 6:16 PM · System administration, Vault, Sentry
ardumont moved T3920: Errors from vault workers are not logged to Sentry from in-progress to code-review/await-feedback/pause on the System administration board.
Feb 8 2022, 5:15 PM · System administration, Vault, Sentry
ardumont changed the status of T3920: Errors from vault workers are not logged to Sentry from Open to Work in Progress.
Feb 8 2022, 5:15 PM · System administration, Vault, Sentry
ardumont moved T3920: Errors from vault workers are not logged to Sentry from Backlog to Weekly backlog on the System administration board.
Feb 8 2022, 5:15 PM · System administration, Vault, Sentry
ardumont added a revision to T3920: Errors from vault workers are not logged to Sentry: D7123: Configure vault cookers to send their issue to sentry.
Feb 8 2022, 4:18 PM · System administration, Vault, Sentry
ardumont added a comment to T3920: Errors from vault workers are not logged to Sentry.

Yes, i confirm (from #swh-sysadm discussion)

Feb 8 2022, 4:13 PM · System administration, Vault, Sentry
anlambert updated subscribers of T3920: Errors from vault workers are not logged to Sentry.

Looking at puppet configuration, my guess is that the sentry_dsn is not set for the vault cookers.

Feb 8 2022, 3:52 PM · System administration, Vault, Sentry
vlorentz triaged T3920: Errors from vault workers are not logged to Sentry as High priority.
Feb 8 2022, 2:55 PM · System administration, Vault, Sentry

Oct 15 2021

ardumont closed T3619: Enable Sentry for swh-graph as Resolved.
Oct 15 2021, 11:01 AM · System administration, Sentry
ardumont moved T3619: Enable Sentry for swh-graph from code-review/await-feedback/pause to deployed/landed/monitoring on the System administration board.
Oct 15 2021, 10:55 AM · System administration, Sentry
ardumont moved T3619: Enable Sentry for swh-graph from in-progress to code-review/await-feedback/pause on the System administration board.
Oct 15 2021, 10:42 AM · System administration, Sentry
ardumont changed the status of T3619: Enable Sentry for swh-graph from Open to Work in Progress.
Oct 15 2021, 10:42 AM · System administration, Sentry
ardumont added a revision to T3619: Enable Sentry for swh-graph: D6480: Activate sentry for swh.graph.
Oct 15 2021, 10:33 AM · System administration, Sentry
ardumont moved T3619: Enable Sentry for swh-graph from Backlog to Weekly backlog on the System administration board.
Oct 15 2021, 10:29 AM · System administration, Sentry

Sep 29 2021

ardumont added a comment to T3619: Enable Sentry for swh-graph.

In the mean time, logs can be reached in the dedicated dashboard

Sep 29 2021, 3:37 PM · System administration, Sentry
vlorentz triaged T3619: Enable Sentry for swh-graph as High priority.
Sep 29 2021, 3:05 PM · System administration, Sentry

Sep 17 2021

ardumont triaged T3587: check swh sentry instance for eventual sensitive info leakage as Normal priority.
Sep 17 2021, 3:01 PM · Sentry

Sep 8 2021

vlorentz changed the status of T3466: Enable Sentry for the deposit server from Resolved to Invalid.
Sep 8 2021, 4:38 PM · System administration, Sentry, SWORD deposit
ardumont changed the status of T3466: Enable Sentry for the deposit server from Invalid to Resolved.
Sep 8 2021, 4:31 PM · System administration, Sentry, SWORD deposit

Sep 3 2021

moranegg moved T3466: Enable Sentry for the deposit server from Backlog to Landed/Tests/Validations (staging) on the SWORD deposit board.
Sep 3 2021, 11:36 AM · System administration, Sentry, SWORD deposit

Aug 5 2021

vlorentz closed T3466: Enable Sentry for the deposit server as Invalid.

uh, indeed

Aug 5 2021, 2:57 PM · System administration, Sentry, SWORD deposit
ardumont added a comment to T3466: Enable Sentry for the deposit server.

Does it?

Aug 5 2021, 2:53 PM · System administration, Sentry, SWORD deposit
vlorentz triaged T3466: Enable Sentry for the deposit server as Normal priority.
Aug 5 2021, 2:48 PM · System administration, Sentry, SWORD deposit

Jul 29 2021

ardumont moved T2910: Sentry: Increase disk space from deployed/landed/monitoring to done on the System administration board.
Jul 29 2021, 1:22 PM · Sentry, System administration

Feb 18 2021

vsellier changed the status of T3015: Sentry should have two different projects for swh-indexer and swh-indexer-storage from Invalid to Resolved.
Feb 18 2021, 9:28 AM · System administration, Sentry

Feb 11 2021

vlorentz closed T3015: Sentry should have two different projects for swh-indexer and swh-indexer-storage as Invalid.

Well my concern was about having different versions running at the same time, but Sentry is able to detect the version so that's not it.

Feb 11 2021, 5:34 PM · System administration, Sentry
vsellier claimed T3015: Sentry should have two different projects for swh-indexer and swh-indexer-storage.
Feb 11 2021, 3:34 PM · System administration, Sentry
vsellier added a comment to T3015: Sentry should have two different projects for swh-indexer and swh-indexer-storage.

I'm not sure to understand the real problem here.
As the indexer and indexer-storage are in same source repository, the versions should match or increase in //. Sentry should be able to deal with it as any other version upgrade.

Feb 11 2021, 3:23 PM · System administration, Sentry