Page MenuHomeSoftware Heritage

[staging] Properly recreate the origin_intrinsic_metadata topic
Closed, ResolvedPublic

Description

The topic was created with the default configuration so there is only one partition and the expiration policy is not defined:

 % /opt/kafka/bin/kafka-topics.sh  --bootstrap-server $SERVER --describe --topic swh.journal.indexed.origin_intrinsic_metadata
Topic: swh.journal.indexed.origin_intrinsic_metadata	PartitionCount: 1	ReplicationFactor: 1	Configs: max.message.bytes=104857600
	Topic: swh.journal.indexed.origin_intrinsic_metadata	Partition: 0	Leader: 1	Replicas: 1	Isr: 1

Event Timeline

vsellier changed the task status from Open to Work in Progress.Jun 18 2021, 9:47 AM
vsellier triaged this task as Normal priority.
vsellier created this task.
vsellier moved this task from Backlog to in-progress on the System administration board.
root@worker3:/etc/softwareheritage# puppet agent --disable 'recreate origin_intrinsic_metadata topic'
root@worker3:/etc/softwareheritage# systemctl stop swh-worker@indexer_origin_intrinsic_metadata.service
root@search0:~/T3391# puppet agent --disable 'recreate origin_intrinsic_metadata topic'
root@search0:~/T3391# systemctl stop swh-search-journal-client@indexed.service
vsellier@journal0 /var/log/kafka
 % /opt/kafka/bin/kafka-topics.sh  --bootstrap-server $SERVER --delete --topic swh.journal.indexed.origin_intrinsic_metadata
vsellier@journal0 /var/log/kafka
 % /opt/kafka/bin/kafka-topics.sh --bootstrap-server $SERVER --create --config cleanup.policy=compact --partitions 64 --replication-factor 1 --topic "swh.journal.indexed.origin_intrinsic_metadata"
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic swh.journal.indexed.origin_intrinsic_metadata.
 % /opt/kafka/bin/kafka-topics.sh  --bootstrap-server $SERVER --describe --topic swh.journal.indexed.origin_intrinsic_metadata                                                                      
Topic: swh.journal.indexed.origin_intrinsic_metadata	PartitionCount: 64	ReplicationFactor: 1	Configs: cleanup.policy=compact,max.message.bytes=104857600
	Topic: swh.journal.indexed.origin_intrinsic_metadata	Partition: 0	Leader: 1	Replicas: 1	Isr: 1
	Topic: swh.journal.indexed.origin_intrinsic_metadata	Partition: 1	Leader: 1	Replicas: 1	Isr: 1
	Topic: swh.journal.indexed.origin_intrinsic_metadata	Partition: 2	Leader: 1	Replicas: 1	Isr: 1
...
root@worker3:/etc/systemd/system# systemctl start swh-worker@indexer_origin_intrinsic_metadata.service
root@worker3:/etc/systemd/system# puppet agent --enable
root@search0:~/T3391# systemctl start swh-search-journal-client@indexed.service 
root@search0:~/T3391# puppet agent --enable

Indexation rescheduled as in https://forge.softwareheritage.org/T3037#58463:

swhscheduler@scheduler0:~$ /usr/bin/swh scheduler --config-file scheduler.yml task schedule_origins --storage-url http://storage1.internal.staging.swh.network:5002 index-origin-metadata 2>&1 | tee schedule_origins.logs
...
page_token: 79901
Scheduled 8000 tasks (80000 origins).
page_token: 80001
page_token: 80101
...

and counting...

vsellier moved this task from in-progress to done on the System administration board.