Plan, out of [1] diff:
- scheduler0.staging, workers.staging: Stop puppet [2]
- scheduler0.staging: stop the old journal client [3]
- workers.staging: Wait for all tasks to finish
- Stop swh-worker@indexer_origin_intrinsic_metadata [4]
- D7928: Rework puppet manifest to drop old services + update indexer service (as journal client)
-
scheduler-nodes: Clean up old service(previous diff ^ does it) - Remove celery workers and queues (click-click on rabbitmq ui from scheduler0.staging [5])
- pergamon: Deploy diff [6]
- scheduler0.staging: Apply puppet change (drop old journal client service)
- workers.staging: Upgrade python3-swh.indexer to v1.1.0
- P1370: Issue with that version [7] ^
- w/ vlorentz: Package new python3-swh.indexer to v1.2.0
- workers.staging: Upgrade python3-swh.indexer to v1.2.0
- Unstuck next problem... (the configuration is now off) [9]
- workers.staging: Apply puppet change (drop old service, deploy new journal client service) [10]
- T4282#86233: Backing down: It's not ready so reverting the current deployment
- Blocked by T4274
- D7951: Actual deployment when it's ready
- Follow journal consumption (from current offsets) [12]
- T4282#88364: Reindex everything from scratch (reset offsets [11]
- Follow journal consumption [12]
[1] D7899
[2]
root@pergamon:~# clush -b -w scheduler0.internal.staging.swh.network -w @staging-workers 'puppet agent --disable "T4282: Migrate to origin intrinsic meta indexer as journal client"'
[3]
root@pergamon:~# clush -b -w scheduler0.internal.staging.swh.network systemctl stop swh-indexer-journal-client.service
[4]
root@pergamon:~# clush -b -w @staging-workers systemctl stop swh-worker@indexer_origin_intrinsic_metadata.service
[5] http://scheduler0.internal.staging.swh.network:15672/#/queues
[6]
root@pergamon:~# /usr/local/bin/deploy.sh HEAD is now at eff3f30 Add snyk-stg-01 credentials Already up to date. HEAD is now at eff3f30 Add snyk-stg-01 credentials Already up to date.
[7]
root@scheduler0:~# puppet agent --enable; puppet agent --test ... # it passed nonetheless beyond me and applied the stuff ¯\_(ツ)_/¯ root@scheduler0:~# systemctl list-units | grep swh-indexer-journal-client.service root@scheduler0:~# # no longer present ^ vs saatchi
[8] prod (untouched for now)
root@saatchi:~# systemctl list-units | grep swh-indexer-journal-client.service swh-indexer-journal-client.service loaded active running Software Heritage Indexer Journal Client
[9]
swhworker@worker0:~$ /usr/bin/swh indexer --config-file $SWH_CONFIG_FILENAME journal-client '*' Traceback (most recent call last): File "/usr/bin/swh", line 33, in <module> sys.exit(load_entry_point('swh.core==2.8.0', 'console_scripts', 'swh')()) File "/usr/lib/python3/dist-packages/swh/core/cli/__init__.py", line 184, in main return swh(auto_envvar_prefix="SWH") File "/usr/lib/python3/dist-packages/click/core.py", line 764, in __call__ return self.main(*args, **kwargs) File "/usr/lib/python3/dist-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/usr/lib/python3/dist-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/lib/python3/dist-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/lib/python3/dist-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/lib/python3/dist-packages/click/core.py", line 555, in invoke return callback(*args, **kwargs) File "/usr/lib/python3/dist-packages/click/decorators.py", line 17, in new_func return f(get_current_context(), *args, **kwargs) File "/usr/lib/python3/dist-packages/swh/indexer/cli.py", line 310, in journal_client idx = OriginMetadataIndexer() File "/usr/lib/python3/dist-packages/swh/indexer/metadata.py", line 325, in __init__ self.revision_metadata_indexer = RevisionMetadataIndexer(config=config) File "/usr/lib/python3/dist-packages/swh/indexer/metadata.py", line 163, in __init__ super().__init__(*args, **kwargs) File "/usr/lib/python3/dist-packages/swh/indexer/indexer.py", line 167, in __init__ self.check() File "/usr/lib/python3/dist-packages/swh/indexer/indexer.py", line 202, in check raise ValueError("Tools %s is unknown, cannot continue" % self.tools) ValueError: Tools [] is unknown, cannot continue swhworker@worker0:~$ /usr/bin/swh indexer --config-file $SWH_CONFIG_FILENAME journal-client '*'
[10]
root@pergamon:~# clush -b -w @staging-workers "systemctl status swh-indexer-journal-client" | grep -c running 4
[11] It can reuse either the group_id to avoid re-indexing, or use a new one to reindex
everything (to solve old previous temporary failures we ever had). Or we can reset the
topic to reindex everything.
[12] https://grafana.softwareheritage.org/goto/P4UllFR4z?orgId=1