Page MenuHomeSoftware Heritage

Deploy visit-stats journal client on staging
Closed, MigratedEdits Locked

Description

Deploy in staging :

  1. Stop the workers
  2. Stop the storage service
  3. Upgrade the storage storage server to last versions of swh-storage/swh-model/swh-journal
  4. Upgrade the database model
  5. Restart the storage service
  6. (Upgrade and ) restart the workers
  7. Stop search and indexer journal clients
  8. Stop the loaders (to avoid messages to kafka and automatic topic creation)
  9. Recreate the origin-visit-status topic
  10. Launch a backfill of the origin_visit_status topic
  11. Upgrade stack (webapp/scheduler/search/...) to last versions created during the sprint
  12. Upgrade docker environment to add swh-scheduler-journal-client service (D4901)
  13. Upgrade the scheduler database model
  14. Deploy a new journal-client service on scheduler0
  15. Reset journal client offsets on origin_visit_status topic
  16. (no necessary due to T2944) Restart journal clients

Event Timeline

vsellier changed the task status from Open to Work in Progress.Jan 19 2021, 1:48 PM
vsellier triaged this task as Normal priority.
vsellier created this task.
vsellier moved this task from Backlog to in-progress on the Sprint 2021 01 board.
vsellier updated the task description. (Show Details)

All staging worker stopped:

root@pergamon:~# sudo clush -b -w @staging-workers 'puppet agent --disable "Deploy new storage version"; cd /etc/systemd/system/multi-user.target.wants; for unit in swh-worker@*; do systemctl disable $unit; done; systemctl stop swh-worker@*'
  • ... (some basic server upgrade and reboot stuff)
  • restart the workers
sudo clush -b -w @staging-workers 'puppet agent --enable; systemctl default'

The workers didn't restart after this command so they were restarted with :

sudo clush -b -w @staging-workers 'cd /etc/systemd/system/multi-user.target.wants; for unit in swh-worker@*; do systemctl start $unit; done'
  • Stop search journal client (indexer journal client is not running)
root@search0:~# puppet agent --disable 'origin-visit-status topic reset'
root@search0:/etc/systemd/system# systemctl stop swh-search-journal-client@objects
root@search0:/etc/systemd/system# systemctl disable swh-search-journal-client@objects
Removed /etc/systemd/system/multi-user.target.wants/swh-search-journal-client@objects.service.
  • Recreate the origin-visit-status kafka's topic
    • Current configuration
 % ./kafka-topics.sh --bootstrap-server ${SERVER} --topic swh.journal.objects.origin_visit_status --describe
Topic: swh.journal.objects.origin_visit_status	PartitionCount: 64	ReplicationFactor: 1	Configs: cleanup.policy=compact,max.message.bytes=104857600
	Topic: swh.journal.objects.origin_visit_status	Partition: 0	Leader: 1	Replicas: 1	Isr: 1
	Topic: swh.journal.objects.origin_visit_status	Partition: 1	Leader: 1	Replicas: 1	Isr: 1
...
  • Delete the topic :
./kafka-topics.sh --bootstrap-server journal0.internal.staging.swh.network:9092 --create --config cleanup.policy=compact --partitions 64 --replication-factor 1 --topic swh.journal.objects.origin_visit_status            
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic swh.journal.objects.origin_visit_status.

% ./kafka-topics.sh --bootstrap-server ${SERVER} --topic swh.journal.objects.origin_visit_status --describe  
Topic: swh.journal.objects.origin_visit_status	PartitionCount: 64	ReplicationFactor: 1	Configs: cleanup.policy=compact,max.message.bytes=104857600
	Topic: swh.journal.objects.origin_visit_status	Partition: 0	Leader: 1	Replicas: 1	Isr: 1

Backfill launched from storage1 with this script : P927 (10 ranges in //) and finished in ~15mn

vsellier claimed this task.
vsellier updated the task description. (Show Details)
vsellier moved this task from code review to done on the Sprint 2021 01 board.