Page MenuHomeSoftware Heritage

Deploy swh-search v0.4.1
Closed, MigratedEdits Locked

Description

Deploy the new 0.4.1 version with the fix for the has_visit status.

According D4818, the basic migration can be done like this :

Migration:

  • curl -XPOST "https://localhost:9200/origin/_update_by_query" -d'{"script": {"inline": "ctx._source.has_visits = false", "lang": "painless"}}'
  • reset the offset of the swh-search client on the swh.journal.objects.origin_visit_status topic

This is correct, but doing like this, the search will not be completely accurate until the backfill of the origin_visit_status topic is done.
It's not important currently as the swh-search is not really used in production, but it's the opportunity to test a way to do such reindexation smoothly.

Related Objects

StatusAssignedTask
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration

Event Timeline

vsellier triaged this task as Normal priority.Jan 7 2021, 6:39 PM
vsellier created this task.
ardumont renamed this task from Deploy T2936 in staging to Deploy swh-search v0.4.1 in staging.Jan 11 2021, 10:49 AM
ardumont moved this task from Backlog to Weekly backlog on the System administration board.
vsellier renamed this task from Deploy swh-search v0.4.1 in staging to Deploy swh-search v0.4.1.Jan 25 2021, 3:32 PM
vsellier changed the task status from Open to Work in Progress.
vsellier moved this task from Weekly backlog to in-progress on the System administration board.

Regarding the index rebuilding process, using a naive approach with aliases with the old and the new index[1] returns duplicated results when the search is done.
Using an alias with only the old index, rebuilding a new index and switching the alias to the new index[2] can be a first approach with the default the old index will not be updated until the alias is switched to the new index.
It also requires the swh-search code is able to use different names for the read and write operations.

[1]


[2]

Staging

We are proceeding to a complete index rebuilding

  • debian packages upgraded on search0.staging
  • journal client stopped
  • 'origin' index removed on elasticsearch and recreated
% curl -XDELETE http://$ES_SERVER/origin  
 % sudo swh search --config-file /etc/softwareheritage/search/server.yml  initialize
  • Journal client offset reset :
export SERVER=journal0.internal.staging.swh.network:9092
./kafka-consumer-groups.sh --bootstrap-server $SERVER --delete --group swh.search.journal_client 
Deletion of requested consumer groups ('swh.search.journal_client') was successful.
  • swh-search-journal-client restarted

The filter on visited origins is working correctly on staging. The has_visit flag looks good.
For example for the https://www.npmjs.com/package/@ehmicky/dev-tasks origin

{
  "_index" : "origin",
  "_type" : "_doc",
  "_id" : "019bd314416108304165e82dd92e00bc9ea85a53",
  "_score" : 60.56421,
  "_source" : {
    "url" : "https://www.npmjs.com/package/@ehmicky/dev-tasks",
    "sha1" : "019bd314416108304165e82dd92e00bc9ea85a53"
  },
  "sort" : [
    60.56421,
    "019bd314416108304165e82dd92e00bc9ea85a53"
  ]
}
swh=> select * from origin join origin_visit_status on id=origin where id=469380;
   id   |                       url                        | origin | visit |             date              | status  | metadata |                  snapshot                  | type 
--------+--------------------------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------
 469380 | https://www.npmjs.com/package/@ehmicky/dev-tasks | 469380 |     1 | 2021-01-25 13:30:47.221937+00 | created |          |                                            | npm
 469380 | https://www.npmjs.com/package/@ehmicky/dev-tasks | 469380 |     1 | 2021-01-25 13:41:59.435579+00 | partial |          | \xe3f24413d81fd3e9c309686fcfb6c8f5eb549acf | npm

Production

  • puppet disabled
  • Services stopped :
root@search1:~# systemctl stop swh-search-journal-client@objects.service 
root@search1:~# systemctl stop gunicorn-swh-search
  • Index deleted and recreated
% export ES_SERVER=search-esnode1.internal.softwareheritage.org:9200
% curl -s http://$ES_SERVER/_cat/indices\?v 
health status index  uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin Mq8dnlpuRXO4yYoC6CTuQw  90   1  151716299     38861934    260.8gb          131gb
% curl -XDELETE http://$ES_SERVER/origin
{"acknowledged":true}%    
% swh search --config-file /etc/softwareheritage/search/server.yml  initialize
INFO:elasticsearch:PUT http://search-esnode1.internal.softwareheritage.org:9200/origin [status:200 request:2.216s]
INFO:elasticsearch:PUT http://search-esnode3.internal.softwareheritage.org:9200/origin/_mapping [status:200 request:0.151s]
Done.
% curl -s http://$ES_SERVER/_cat/indices\?v                                        
health status index  uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin yFaqPPCnRFCnc5AA6Ah8lw  90   1          0            0     36.5kb         18.2kb
  • journal client's consumer group delete:
% export SERVER=kafka1.internal.softwareheritage.org:9092  
% ./kafka-consumer-groups.sh --bootstrap-server ${SERVER} --delete --group swh.search.journal_client
Deletion of requested consumer groups ('swh.search.journal_client') was successful.
  • journal client restarted
  • puppet enabled

The journal client is in progress :

% curl -s http://$ES_SERVER/_cat/indices\?v 
health status index  uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin yFaqPPCnRFCnc5AA6Ah8lw  90   1      42184            0     45.6mb         42.7mb

Upgrading the index configuration to speedup the indexation :

% cat >/tmp/config.json <<EOF
{
  "index" : {
"translog.sync_interval" : "60s",
"translog.durability": "async",
"refresh_interval": "60s"
  }
}
EOF
% export ES_SERVER=192.168.100.81:9200
% export INDEX=origin            
% curl -s -H "Content-Type: application/json" -XPUT http://${ES_SERVER}/${INDEX}/_settings -d @/tmp/config.json 
{"acknowledged":true}%

To decrease the time to recover the lag, several journal client were launched in // with :

/usr/bin/swh search --config-file /etc/softwareheritage/search/journal_client_objects.yml journal-client objects

the memory of search1 server had to be increased to avoid OOM killer: rSPREa7c9c625d98c136ca3d75e649c05bcd4de9aa19b

The journal_client has almost ingested the topics[1] it listens. It took some more time because a backfill of the origin_visit_status was launched for T2993.
It should be done by the end of the day.

[1] https://grafana.softwareheritage.org/goto/EQE43DYMz

The backfill is done.