Page MenuHomeSoftware Heritage

[swh-search] Deploy v0.9.0 on production and execute a full origin and metadata reindexation
Closed, ResolvedPublic

Description

  • on search1
    • stop puppet
    • stop the objects and metadata journal clients
    • upgrade the debian packages
    • restart swh-search to declare the new field mapping in the old index
    • restart puppet
    • manually launch journal client configured to index on a origin-v0.9 index
    • wait for the end of the reindexation
    • upgrade the new swh-search and journal client configurations in puppet to use the new index (done for webapp1)

Event Timeline

vsellier triaged this task as Normal priority.Jun 21 2021, 11:50 AM
vsellier created this task.
vsellier changed the task status from Open to Work in Progress.Jun 22 2021, 9:48 AM
vsellier moved this task from Backlog to in-progress on the System administration board.

On search1:

  • puppet disabled
  • swh-search / jounal clients stopped
  • packages updated:
apt list --upgradable 2>/dev/null | grep python3-swh | cut -f1 -d'/' | xargs  apt install -V --dry-run
...
The following packages will be upgraded:
The following packages will be upgraded:
   python3-swh.core (0.13.0-1~swh1~bpo10+1 => 0.14.3-1~swh1~bpo10+1)
   python3-swh.indexer (0.7.0-1~swh1~bpo10+1 => 0.8.0-1~swh1~bpo10+1)
   python3-swh.indexer.storage (0.7.0-1~swh1~bpo10+1 => 0.8.0-1~swh1~bpo10+1)
   python3-swh.journal (0.7.1-1~swh1~bpo10+1 => 0.8.0-1~swh1~bpo10+1)
   python3-swh.model (2.3.0-1~swh1~bpo10+1 => 2.6.1-1~swh1~bpo10+1)
   python3-swh.objstorage (0.2.2-1~swh1~bpo10+1 => 0.2.3-1~swh1~bpo10+1)
   python3-swh.scheduler (0.10.0-1~swh1~bpo10+1 => 0.15.0-1~swh1~bpo10+1)
   python3-swh.search (0.8.0-1~swh1~bpo10+1 => 0.9.0-1~swh1~bpo10+1)
   python3-swh.storage (0.27.2-1~swh1~bpo10+1 => 0.30.1-1~swh1~bpo10+1)
9 upgraded, 0 newly installed, 0 to remove and 8 not upgraded.
...
  • Index intialisation done and swh-search restarted:
root@search1:~# swh search -C /etc/softwareheritage/search/server.yml initialize
INFO:elasticsearch:HEAD http://search-esnode6.internal.softwareheritage.org:9200/origin-production [status:200 request:0.025s]
INFO:elasticsearch:HEAD http://search-esnode4.internal.softwareheritage.org:9200/origin-read/_alias [status:200 request:0.018s]
INFO:elasticsearch:HEAD http://search-esnode5.internal.softwareheritage.org:9200/origin-write/_alias [status:200 request:0.003s]
INFO:elasticsearch:PUT http://search-esnode6.internal.softwareheritage.org:9200/origin-production/_mapping [status:200 request:0.102s]
Done.
root@search1:~# systemctl start gunicorn-swh-search.service
  • journal client restarted with no errors on the logs, the search is still working fom the webapp
  • journal clients configuration prepared:
root@search1:~/T3398# diff -U3 /etc/softwareheritage/search/journal_client_objects.yml journal_client_objects.yml 
--- /etc/softwareheritage/search/journal_client_objects.yml     2021-06-10 08:08:19.555062808 +0000
+++ journal_client_objects.yml  2021-06-22 09:19:04.841898294 +0000
@@ -8,13 +8,18 @@
     port: 9200
   - host: search-esnode6.internal.softwareheritage.org
     port: 9200
+  indexes:
+    origin: 
+      index: origin-v0.9.0
+      read_alias: origin-v0.9.0-read
+      write_alias: origin-v0.9.0-write
 journal:
   brokers:
   - kafka1.internal.softwareheritage.org
   - kafka2.internal.softwareheritage.org
   - kafka3.internal.softwareheritage.org
   - kafka4.internal.softwareheritage.org
-  group_id: swh.search.journal_client
+  group_id: swh.search.journal_client-v0.9.0
   prefix: swh.journal.objects
   object_types:
   - origin
root@search1:~/T3398# diff -U3 /etc/softwareheritage/search/journal_client_indexed.yml journal_client_indexed.yml
--- /etc/softwareheritage/search/journal_client_indexed.yml     2021-06-10 09:34:00.980897650 +0000
+++ journal_client_indexed.yml  2021-06-22 09:27:18.507340257 +0000
@@ -8,13 +8,18 @@
     port: 9200
   - host: search-esnode6.internal.softwareheritage.org
     port: 9200
+  indexes:
+    origin:
+      index: origin-v0.9.0
+      read_alias: origin-v0.9.0-read
+      write_alias: origin-v0.9.0-write
 journal:
   brokers:
   - kafka1.internal.softwareheritage.org
   - kafka2.internal.softwareheritage.org
   - kafka3.internal.softwareheritage.org
   - kafka4.internal.softwareheritage.org
-  group_id: swh.search.journal_client.indexed
+  group_id: swh.search.journal_client.indexed-v0.9.0
   prefix: swh.journal.indexed
   object_types:
   - origin_intrinsic_metadata
  • new index initialized:
root@search1:~/T3398# diff -U3 /etc/softwareheritage/search/server.yml server.yml 
--- /etc/softwareheritage/search/server.yml     2021-06-10 08:08:17.819058015 +0000
+++ server.yml  2021-06-22 09:11:16.132518743 +0000
@@ -10,7 +10,7 @@
     port: 9200
   indexes:
     origin:
-      index: origin-production
-      read_alias: origin-read
-      write_alias: origin-write
+      index: origin-v0.9.0
+      read_alias: origin-v0.9.0-read
+      write_alias: origin-v0.9.0-write
root@search1:~/T3398# swh search --config-file server.yml initialize
INFO:elasticsearch:PUT http://search-esnode6.internal.softwareheritage.org:9200/origin-v0.9.0 [status:200 request:0.933s]
INFO:elasticsearch:PUT http://search-esnode4.internal.softwareheritage.org:9200/origin-v0.9.0/_alias/origin-v0.9.0-read [status:200 request:0.057s]
INFO:elasticsearch:PUT http://search-esnode5.internal.softwareheritage.org:9200/origin-v0.9.0/_alias/origin-v0.9.0-write [status:200 request:0.038s]
INFO:elasticsearch:PUT http://search-esnode4.internal.softwareheritage.org:9200/origin-v0.9.0/_mapping [status:200 request:0.045s]
Done.
  • Index configured to improve the indexation speed:
root@search1:~/T3398# export ES_SERVER=search-esnode4:9200
root@search1:~/T3398# export INDEX=origin-v0.9.0
root@search1:~/T3398# curl -XPUT -H 'Content-Type: application/json' ${ES_SERVER}/${INDEX}/_settings -d '
> {
>   "index" : {
> "translog.sync_interval" : "60s",
> "translog.durability": "async",
> "refresh_interval": "60s"
>   }
> }'
{"acknowledged":true}
  • journal clients started:
root@search1:~/T3398# swh search --config-file journal_client_objects.yml journal-client objects
INFO:elasticsearch:POST http://search-esnode4.internal.softwareheritage.org:9200/origin-v0.9.0-write/_bulk [status:200 request:0.013s]
INFO:elasticsearch:POST http://search-esnode5.internal.softwareheritage.org:9200/origin-v0.9.0-write/_bulk [status:200 request:0.014s]
INFO:elasticsearch:POST http://search-esnode6.internal.softwareheritage.org:9200/origin-v0.9.0-write/_bulk [status:200 request:0.012s]
INFO:elasticsearch:POST http://search-esnode4.internal.softwareheritage.org:9200/origin-v0.9.0-write/_bulk [status:200 request:0.012s]
...
root@search1:~/T3398# swh search --config-file journal_client_indexed.yml journal-client objects
INFO:elasticsearch:POST http://search-esnode4.internal.softwareheritage.org:9200/origin-v0.9.0-write/_bulk [status:200 request:0.758s]
INFO:elasticsearch:POST http://search-esnode5.internal.softwareheritage.org:9200/origin-v0.9.0-write/_bulk [status:200 request:0.023s]
INFO:elasticsearch:POST http://search-esnode6.internal.softwareheritage.org:9200/origin-v0.9.0-write/_bulk [status:200 request:0.024s]
INFO:elasticsearch:POST http://search-esnode4.internal.softwareheritage.org:9200/origin-v0.9.0-write/_bulk [status:200 request:0.023s]
...

the reindexation should be done by the end of the day

It still remains 1 day to consume the origin*topics.
The metadata were completely ingested so the metadatasearch can be tested on webapp1 after the configuration will be updated to use the new index.

vsellier moved this task from in-progress to done on the System administration board.

The lag on the topics has recovered.
The configuration update of moma will be followed in T3373