Page MenuHomeSoftware Heritage

[sentry] Out of disk space
Closed, MigratedEdits Locked

Description

Due to an out of disk space outage, the sentry service is down

root@riverside:/var/lib/sentry-onpremise# df -h /
Filesystem                      Size  Used Avail Use% Mounted on
/dev/mapper/riverside--vg-root  125G  121G     0 100% /

It seems kafka is using most of the disk space:

root@riverside:/var/lib/docker/volumes# du -sch * | sort -h | tail -n 5
31M	sentry-redis
933M	sentry_onpremise_sentry-clickhouse-log
7.6G	sentry-clickhouse
73G	sentry-kafka
82G	total

Event Timeline

vsellier changed the task status from Open to Work in Progress.EditedSep 6 2022, 11:09 AM
vsellier triaged this task as Unbreak Now! priority.
vsellier created this task.

The biggest topics are:

root@riverside:/var/lib/docker/volumes/sentry-kafka/_data# du -sch * | sort -h | tail -n 5
31M	snuba-commit-log-0
291M	outcomes-0
30G	ingest-events-0
43G	events-0
73G	total

There rentention policy is configured by default:

root@riverside:/var/lib/sentry-onpremise# docker-compose-1.29.2 run --rm kafka kafka-topics --bootstrap-server kafka:9092  --describe --topic events
Creating sentry-self-hosted_kafka_run ... done
Topic: events	PartitionCount: 1	ReplicationFactor: 1	Configs: cleanup.policy=delete,max.message.bytes=50000000,message.timestamp.type=LogAppendTime
	Topic: events	Partition: 0	Leader: 1001	Replicas: 1001	Isr: 1001
root@riverside:/var/lib/sentry-onpremise# docker-compose-1.29.2 run --rm kafka kafka-topics --bootstrap-server kafka:9092  --describe --topic ingest-events
Creating sentry-self-hosted_kafka_run ... done
Topic: ingest-events	PartitionCount: 1	ReplicationFactor: 1	Configs: cleanup.policy=delete,max.message.bytes=50000000
	Topic: ingest-events	Partition: 0	Leader: 1001	Replicas: 1001	Isr: 1001

which means 7 days of history
The configuration is overridden in the docker-compose / .env file to 24h

No consumers seem to have a big lag on these topics, so it should be possible to reduce the lag to unblock the server and have a look which service is sending the events:

root@riverside:/var/lib/sentry-onpremise# docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --list | tr -d '\r' | xargs -t -n1 docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092  --describe --group | grep -e GROUP -e " events "
Creating sentry-self-hosted_kafka_run ... done
docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group snuba-consumers
Creating sentry-self-hosted_kafka_run ... done
GROUP           TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                  HOST            CLIENT-ID
snuba-consumers events          0          82585390        82587094        1704            -                                            -               -
docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group snuba-post-processor:sync:6fa9928e1d6911edac290242ac170014
Creating sentry-self-hosted_kafka_run ... done
GROUP                                                      TOPIC            PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                  HOST            CLIENT-ID
docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group ingest-consumer
Creating sentry-self-hosted_kafka_run ... done

Consumer group 'ingest-consumer' has no active members.
GROUP           TOPIC               PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID     HOST            CLIENT-ID
docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group snuba-transactions-subscriptions-consumers
Creating sentry-self-hosted_kafka_run ... done
GROUP                                      TOPIC            PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                  HOST            CLIENT-ID
docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group transactions_group
Creating sentry-self-hosted_kafka_run ... done
GROUP              TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                  HOST            CLIENT-ID
transactions_group events          0          82587077        82587094        17              rdkafka-5adeddd0-e486-4dde-ae25-a3f65621e401 /172.23.0.15    rdkafka
docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group snuba-post-processor
Creating sentry-self-hosted_kafka_run ... done
GROUP                TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                  HOST            CLIENT-ID
snuba-post-processor events          0          82585390        82587094        1704            rdkafka-6a74a1ee-0ec0-4a3f-adf7-daad73c1747b /172.23.0.20    rdkafka
docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group snuba-events-subscriptions-consumers
Creating sentry-self-hosted_kafka_run ... done
GROUP                                TOPIC            PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                  HOST            CLIENT-ID
docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group snuba-replacers
Creating sentry-self-hosted_kafka_run ... done
GROUP           TOPIC              PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                  HOST            CLIENT-ID
docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group query-subscription-consumer
Creating sentry-self-hosted_kafka_run ... done
GROUP                       TOPIC                             PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                  HOST            CLIENT-ID
root@riverside:/var/lib/sentry-onpremise# docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --list | tr -d '\r' | xargs -t -n1 docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092  --describe --group | grep -e GROUP -e " ingest-events "
Creating sentry-self-hosted_kafka_run ... done
docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group snuba-consumers
Creating sentry-self-hosted_kafka_run ... done
GROUP           TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                  HOST            CLIENT-ID
docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group snuba-post-processor:sync:6fa9928e1d6911edac290242ac170014
Creating sentry-self-hosted_kafka_run ... done
GROUP                                                      TOPIC            PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                  HOST            CLIENT-ID
docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group ingest-consumer
Creating sentry-self-hosted_kafka_run ... done

Consumer group 'ingest-consumer' has no active members.
GROUP           TOPIC               PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID     HOST            CLIENT-ID
ingest-consumer ingest-events       0          76783696        76833404        49708           -               -               -
docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group snuba-transactions-subscriptions-consumers
Creating sentry-self-hosted_kafka_run ... done
GROUP                                      TOPIC            PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                  HOST            CLIENT-ID
docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group transactions_group
Creating sentry-self-hosted_kafka_run ... done
GROUP              TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                  HOST            CLIENT-ID
docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group snuba-post-processor
Creating sentry-self-hosted_kafka_run ... done
GROUP                TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                  HOST            CLIENT-ID
docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group snuba-events-subscriptions-consumers
Creating sentry-self-hosted_kafka_run ... done
GROUP                                TOPIC            PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                  HOST            CLIENT-ID
docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group snuba-replacers
Creating sentry-self-hosted_kafka_run ... done
GROUP           TOPIC              PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                  HOST            CLIENT-ID
docker-compose-1.29.2 run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group query-subscription-consumer
Creating sentry-self-hosted_kafka_run ... done
GROUP                       TOPIC                             PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                  HOST            CLIENT-ID

The root cause is a swh-graph experiment that generated a lot of grpc errors which are huge.

The retention delay was updated to 6 hours for the events topic until the swh-graph stack traces are reduced:

root@riverside:/var/lib/sentry-onpremise# docker-compose-1.29.2 run --rm kafka kafka-topics --zookeeper zookeeper:2181  --topic events --alter --config retention.ms=21600000
Creating sentry-self-hosted_kafka_run ... done
WARNING: Altering topic configuration from this script has been deprecated and may be removed in future releases.
         Going forward, please use kafka-configs.sh for this functionality
Updated config for topic events.

After a restart of kafka, the cleanup job did its job:

root@riverside:~# df -h /
Filesystem                      Size  Used Avail Use% Mounted on
/dev/mapper/riverside--vg-root  125G   95G   25G  80% /
vsellier moved this task from Backlog to done on the System administration board.

Closing as there is no alerts since almost one month