Page MenuHomeSoftware Heritage

[swh-search] Deploy v0.9.0 on staging and execute a full origin and metadata reindexation
Closed, ResolvedPublic

Description

As it's the first time we will use the aliases functionality to avoid a downtime on the search during the reindexation, it deserves a task for the record
For the first time it will be done manually in staging.

Actions:

  • stop puppet
  • update the packages
  • create new journal-client configurations with these properties:
    • index origin.v0.9.0
    • read alias: origin_v0.9.0-read
    • write alias: origin v0.9.0-write
  • manually start the indexers
  • when the backfill is done, update the swh-search configuration to use the new index and aliases, the webapp will use the new index

It will allow to easily make a rollback if anything goes wrong as nothing is destructive

  • if it's ok
    • Update puppet to use this new configuration
    • clean the old index/read aliases

Finally after writing this, I realize the aliases are not useful with this scenario ¯\_(ツ)_/¯

Event Timeline

vsellier changed the task status from Open to Work in Progress.Jun 17 2021, 6:40 PM
vsellier triaged this task as Normal priority.
vsellier created this task.
vsellier moved this task from Backlog to in-progress on the System administration board.
  • Temporary configuration created:
    • objects:
root@search0:~/T3391# diff -U3 /etc/softwareheritage/search/journal_client_objects.yml journal_client_objects.yml 
--- /etc/softwareheritage/search/journal_client_objects.yml     2020-12-10 11:04:08.460777825 +0000
+++ journal_client_objects.yml  2021-06-17 16:48:56.006110527 +0000
@@ -4,10 +4,15 @@
   hosts:
   - host: search-esnode0.internal.staging.swh.network
     port: 9200
+  indexes:
+    origin:
+      index: origin-v0.9.0
+      read_alias: origin-v0.9.0-read
+      write_alias: origin-v0.9.0-write
 journal:
   brokers:
   - journal0.internal.staging.swh.network
-  group_id: swh.search.journal_client
+  group_id: swh.search.journal_client-v0.9.0
   prefix: swh.journal.objects
   object_types:
   - origin
  • indexed:
root@search0:~/T3391# diff -U3 /etc/softwareheritage/search/journal_client_indexed.yml journal_client_indexed.yml 
--- /etc/softwareheritage/search/journal_client_indexed.yml     2021-02-09 17:48:44.269681575 +0000
+++ journal_client_indexed.yml  2021-06-17 16:49:57.926120227 +0000
@@ -4,10 +4,15 @@
   hosts:
   - host: search-esnode0.internal.staging.swh.network
     port: 9200
+  indexes:
+    origin:
+      index: origin-v0.9.0
+      read_alias: origin-v0.9.0-read
+      write_alias: origin-v0.9.0-write
 journal:
   brokers:
   - journal0.internal.staging.swh.network
-  group_id: swh.search.journal_client.indexed
+  group_id: swh.search.journal_client.indexed-v0.9.0
   prefix: swh.journal.indexed
   object_types:
   - origin_intrinsic_metadata
  • upgrade packages:
root@search0:~/T3391# apt list --upgradable
Listing... Done
libnss-systemd/buster-backports 247.3-5~bpo10+1 amd64 [upgradable from: 247.3-3~bpo10+1]
libpam-systemd/buster-backports 247.3-5~bpo10+1 amd64 [upgradable from: 247.3-3~bpo10+1]
libsystemd0/buster-backports 247.3-5~bpo10+1 amd64 [upgradable from: 247.3-3~bpo10+1]
libudev1/buster-backports 247.3-5~bpo10+1 amd64 [upgradable from: 247.3-3~bpo10+1]
python3-swh.core/unknown 0.14.3-1~swh1~bpo10+1 all [upgradable from: 0.13.0-1~swh1~bpo10+1]
python3-swh.indexer.storage/unknown 0.8.0-1~swh1~bpo10+1 all [upgradable from: 0.7.0-1~swh1~bpo10+1]
python3-swh.indexer/unknown 0.8.0-1~swh1~bpo10+1 all [upgradable from: 0.7.0-1~swh1~bpo10+1]
python3-swh.model/unknown 2.6.1-1~swh1~bpo10+1 all [upgradable from: 2.3.0-1~swh1~bpo10+1]
python3-swh.objstorage/unknown 0.2.3-1~swh1~bpo10+1 all [upgradable from: 0.2.2-1~swh1~bpo10+1]
python3-swh.scheduler/unknown 0.15.0-1~swh1~bpo10+1 all [upgradable from: 0.10.0-1~swh1~bpo10+1]
python3-swh.search/unknown 0.9.0-1~swh1~bpo10+1 all [upgradable from: 0.8.0-1~swh1~bpo10+1]
python3-swh.storage/unknown 0.30.1-1~swh1~bpo10+1 all [upgradable from: 0.27.2-1~swh1~bpo10+1]
systemd-sysv/buster-backports 247.3-5~bpo10+1 amd64 [upgradable from: 247.3-3~bpo10+1]
systemd-timesyncd/buster-backports 247.3-5~bpo10+1 amd64 [upgradable from: 247.3-3~bpo10+1]
systemd/buster-backports 247.3-5~bpo10+1 amd64 [upgradable from: 247.3-3~bpo10+1]
udev/buster-backports 247.3-5~bpo10+1 amd64 [upgradable from: 247.3-3~bpo10+1]
  • Initialize the new index and aliases:
root@search0:~/T3391# swh search --config-file journal_client_objects.yml initialize
INFO:elasticsearch:PUT http://search-esnode0.internal.staging.swh.network:9200/origin-v0.9.0 [status:200 request:5.373s]
INFO:elasticsearch:PUT http://search-esnode0.internal.staging.swh.network:9200/origin-v0.9.0/_alias/origin-v0.9.0-read [status:200 request:0.052s]
INFO:elasticsearch:PUT http://search-esnode0.internal.staging.swh.network:9200/origin-v0.9.0/_alias/origin-v0.9.0-write [status:200 request:0.038s]
INFO:elasticsearch:PUT http://search-esnode0.internal.staging.swh.network:9200/origin-v0.9.0/_mapping [status:200 request:0.086s]
Done.
  • starting the origin* journal client:
root@search0:~/T3391# swh search --config-file journal_client_objects.yml journal-client objects
INFO:elasticsearch:POST http://search-esnode0.internal.staging.swh.network:9200/origin-v0.9.0-write/_bulk [status:200 request:0.661s]
INFO:elasticsearch:POST http://search-esnode0.internal.staging.swh.network:9200/origin-v0.9.0-write/_bulk [status:200 request:1.671s]
INFO:elasticsearch:POST http://search-esnode0.internal.staging.swh.network:9200/origin-v0.9.0-write/_bulk [status:200 request:2.047s]
INFO:elasticsearch:POST http://search-esnode0.internal.staging.swh.network:9200/origin-v0.9.0-write/_bulk [status:200 request:0.179s]
  • starting the indexed metadata journal client:
root@search0:~/T3391# swh search --config-file journal_client_indexed.yml journal-client objects
  • Index status:
root@search0:~/T3391# curl -s http://search-esnode0:9200/_cat/indices\?v
health status index                       uuid                   pri rep docs.count docs.deleted store.size pri.store.size                                                                                        
green  open   origin                      HthJj42xT5uO7w3Aoxzppw  80   0    1320217       166798      1.7gb          1.7gb                                                                                        
green  open   origin-v0.9.0               o7FiYJWnTkOViKiAdCXCuA  80   0     263556          400    214.8mb        214.8mb                                                                                        
green  close  origin-backup-20210209-1736 P1CKjXW0QiWM5zlzX46-fg  80   0                                                                                                                                          
green  close  origin-v0.5.0               SGplSaqPR_O9cPYU4ZsmdQ  80   0

The origin_visits* topics have not started to be ingested, so the new fields are not yes present on the index.

It seems the swh.journal.indexed.origin_intrinsic_metadata was automatically created so the retention policy was not specifiy (and there is only one partition (!))

 % /opt/kafka/bin/kafka-topics.sh  --bootstrap-server $SERVER --describe --topic swh.journal.indexed.origin_intrinsic_metadata
Topic: swh.journal.indexed.origin_intrinsic_metadata	PartitionCount: 1	ReplicationFactor: 1	Configs: max.message.bytes=104857600
	Topic: swh.journal.indexed.origin_intrinsic_metadata	Partition: 0	Leader: 1	Replicas: 1	Isr: 1

The messages were removed from the topic due to the default expiration strategy.
I will take the opportunity to properly recreate the topic and launch a backfill of its content

Unerlying index of the aliases changed to use the new index for read and write:

root@search0:~/T3391# curl -XPOST -H 'Content-Type: application/json' http://search-esnode0:9200/_aliases -d '
> {
>   "actions" : [
>     { "remove" : { "index" : "origin", "alias" : "origin-read" } },
>     { "remove" : { "index" : "origin", "alias" : "origin-write" } },
>     { "add" : { "index" : "origin-v0.9.0", "alias" : "origin-read" } },
>     { "add" : { "index" : "origin-v0.9.0", "alias" : "origin-write" } }
>   ]
> }'

{"acknowledged":true}
root@search0:~/T3391# curl http://search-esnode0:9200/_cat/aliases
origin-read         origin-v0.9.0 - - - -
origin-v0.9.0-read  origin-v0.9.0 - - - -
origin-v0.9.0-write origin-v0.9.0 - - - -
origin-write        origin-v0.9.0 - - - -

The configuration of the index was changed to increase the reindexation speed:

root@search-esnode0:~# export ES_SERVER=192.168.130.80:9200
root@search-esnode0:~# export INDEX=origin-v0.9.0
root@search-esnode0:~# cat >/tmp/config.json <<EOF
> {
>   "index" : {
> "translog.sync_interval" : "60s",
> "translog.durability": "async",
> "refresh_interval": "60s"
>   }
> }
> EOF
root@search-esnode0:~# curl -s -H "Content-Type: application/json" -XPUT http://${ES_SERVER}/${INDEX}/_settings -d @/tmp/config.json

The default settings will be restored when the journal_client will have recovered.

vsellier moved this task from in-progress to done on the System administration board.

The backfill is done.

  • Puppet is reactivated
  • The index configuration set to the default:
root@search0:/etc/softwareheritage/search# export ES_SERVER=192.168.130.80:9200
root@search0:/etc/softwareheritage/search# export INDEX=origin-v0.9.0
root@search0:/etc/softwareheritage/search# cat >/tmp/config.json <<EOF
> {
>   "index" : {
>   "translog.sync_interval" : null,
>   "translog.durability": null,
>   "refresh_interval": null
>   }
> }
> EOF
root@search0:/etc/softwareheritage/search# curl -s -H "Content-Type: application/json" -XPUT http://${ES_SERVER}/${INDEX}/_settings -d @/tmp/config.json
{"acknowledged":true}

The search looks to works correctly.
The last_update date and visit count looks good too (no checked deeply). For example, one of the docs with a log of visits:

{
  "_index": "origin-v0.9.0",
  "_type": "_doc",
  "_id": "9aa7a9f206a6977061b8b38cc1415c1096e97a76",
  "_version": 164,
  "_seq_no": 518889,
  "_primary_term": 1,
  "found": true,
  "_source": {
    "sha1": "9aa7a9f206a6977061b8b38cc1415c1096e97a76",
    "url": "https://github.com/elastic/kibana.git",
    "visit_types": [
      "git"
    ],
    "nb_visits": 162,
    "has_visits": true,
    "last_visit_date": "2021-01-26T01:53:47.254492+00:00"
  }
}
swh=> select * from origin_visit_status where origin='468540' order by visit desc limit 1;
-[ RECORD 1 ]----------------------------------------
origin   | 468540
visit    | 162
date     | 2021-01-26 01:53:47.254492+00
status   | full
metadata | 
snapshot | \x309f6289ed452da880948eaf7a83d3dba0fb3bed
type     | git