Page MenuHomeSoftware Heritage

Deploy swh-search v0.4.1
Closed, ResolvedPublic

Description

Deploy the new 0.4.1 version with the fix for the has_visit status.

According D4818, the basic migration can be done like this :

Migration:

  • curl -XPOST "https://localhost:9200/origin/_update_by_query" -d'{"script": {"inline": "ctx._source.has_visits = false", "lang": "painless"}}'
  • reset the offset of the swh-search client on the swh.journal.objects.origin_visit_status topic

This is correct, but doing like this, the search will not be completely accurate until the backfill of the origin_visit_status topic is done.
It's not important currently as the swh-search is not really used in production, but it's the opportunity to test a way to do such reindexation smoothly.

Related Objects

StatusAssignedTask
OpenNone
Work in Progressvlorentz
Openvlorentz
OpenNone
Resolvedvlorentz
Resolvedvlorentz
Resolvedvsellier
Work in Progressvlorentz
Resolvedvlorentz
Resolvedvlorentz
Resolvedvsellier
Resolvedvsellier
Resolvedvlorentz
Resolvedvsellier
Resolvedvsellier
Resolvedvlorentz
Resolvedvsellier
Resolvedardumont
Resolvedardumont
OpenNone
Openvsellier
Resolvedanlambert
Resolvedvlorentz
Resolvedvsellier
Resolvedvsellier
Resolvedvsellier

Event Timeline

vsellier triaged this task as Normal priority.Jan 7 2021, 6:39 PM
vsellier created this task.
ardumont renamed this task from Deploy T2936 in staging to Deploy swh-search v0.4.1 in staging.Jan 11 2021, 10:49 AM
ardumont moved this task from Backlog to Weekly backlog on the System administration board.
vsellier renamed this task from Deploy swh-search v0.4.1 in staging to Deploy swh-search v0.4.1.Jan 25 2021, 3:32 PM
vsellier changed the task status from Open to Work in Progress.
vsellier moved this task from Weekly backlog to in-progress on the System administration board.

Regarding the index rebuilding process, using a naive approach with aliases with the old and the new index[1] returns duplicated results when the search is done.
Using an alias with only the old index, rebuilding a new index and switching the alias to the new index[2] can be a first approach with the default the old index will not be updated until the alias is switched to the new index.
It also requires the swh-search code is able to use different names for the read and write operations.

[1]


[2]

Staging

We are proceeding to a complete index rebuilding

  • debian packages upgraded on search0.staging
  • journal client stopped
  • 'origin' index removed on elasticsearch and recreated
% curl -XDELETE http://$ES_SERVER/origin  
 % sudo swh search --config-file /etc/softwareheritage/search/server.yml  initialize
  • Journal client offset reset :
export SERVER=journal0.internal.staging.swh.network:9092
./kafka-consumer-groups.sh --bootstrap-server $SERVER --delete --group swh.search.journal_client 
Deletion of requested consumer groups ('swh.search.journal_client') was successful.
  • swh-search-journal-client restarted

The filter on visited origins is working correctly on staging. The has_visit flag looks good.
For example for the https://www.npmjs.com/package/@ehmicky/dev-tasks origin

{
  "_index" : "origin",
  "_type" : "_doc",
  "_id" : "019bd314416108304165e82dd92e00bc9ea85a53",
  "_score" : 60.56421,
  "_source" : {
    "url" : "https://www.npmjs.com/package/@ehmicky/dev-tasks",
    "sha1" : "019bd314416108304165e82dd92e00bc9ea85a53"
  },
  "sort" : [
    60.56421,
    "019bd314416108304165e82dd92e00bc9ea85a53"
  ]
}
swh=> select * from origin join origin_visit_status on id=origin where id=469380;
   id   |                       url                        | origin | visit |             date              | status  | metadata |                  snapshot                  | type 
--------+--------------------------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------
 469380 | https://www.npmjs.com/package/@ehmicky/dev-tasks | 469380 |     1 | 2021-01-25 13:30:47.221937+00 | created |          |                                            | npm
 469380 | https://www.npmjs.com/package/@ehmicky/dev-tasks | 469380 |     1 | 2021-01-25 13:41:59.435579+00 | partial |          | \xe3f24413d81fd3e9c309686fcfb6c8f5eb549acf | npm

Production

  • puppet disabled
  • Services stopped :
root@search1:~# systemctl stop swh-search-journal-client@objects.service 
root@search1:~# systemctl stop gunicorn-swh-search
  • Index deleted and recreated
% export ES_SERVER=search-esnode1.internal.softwareheritage.org:9200
% curl -s http://$ES_SERVER/_cat/indices\?v 
health status index  uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin Mq8dnlpuRXO4yYoC6CTuQw  90   1  151716299     38861934    260.8gb          131gb
% curl -XDELETE http://$ES_SERVER/origin
{"acknowledged":true}%    
% swh search --config-file /etc/softwareheritage/search/server.yml  initialize
INFO:elasticsearch:PUT http://search-esnode1.internal.softwareheritage.org:9200/origin [status:200 request:2.216s]
INFO:elasticsearch:PUT http://search-esnode3.internal.softwareheritage.org:9200/origin/_mapping [status:200 request:0.151s]
Done.
% curl -s http://$ES_SERVER/_cat/indices\?v                                        
health status index  uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin yFaqPPCnRFCnc5AA6Ah8lw  90   1          0            0     36.5kb         18.2kb
  • journal client's consumer group delete:
% export SERVER=kafka1.internal.softwareheritage.org:9092  
% ./kafka-consumer-groups.sh --bootstrap-server ${SERVER} --delete --group swh.search.journal_client
Deletion of requested consumer groups ('swh.search.journal_client') was successful.
  • journal client restarted
  • puppet enabled

The journal client is in progress :

% curl -s http://$ES_SERVER/_cat/indices\?v 
health status index  uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin yFaqPPCnRFCnc5AA6Ah8lw  90   1      42184            0     45.6mb         42.7mb

Upgrading the index configuration to speedup the indexation :

% cat >/tmp/config.json <<EOF
{
  "index" : {
"translog.sync_interval" : "60s",
"translog.durability": "async",
"refresh_interval": "60s"
  }
}
EOF
% export ES_SERVER=192.168.100.81:9200
% export INDEX=origin            
% curl -s -H "Content-Type: application/json" -XPUT http://${ES_SERVER}/${INDEX}/_settings -d @/tmp/config.json 
{"acknowledged":true}%

To decrease the time to recover the lag, several journal client were launched in // with :

/usr/bin/swh search --config-file /etc/softwareheritage/search/journal_client_objects.yml journal-client objects

the memory of search1 server had to be increased to avoid OOM killer: rSPREa7c9c625d98c136ca3d75e649c05bcd4de9aa19b

The journal_client has almost ingested the topics[1] it listens. It took some more time because a backfill of the origin_visit_status was launched for T2993.
It should be done by the end of the day.

[1] https://grafana.softwareheritage.org/goto/EQE43DYMz

The backfill is done.