Page MenuHomeSoftware Heritage

Deploy swh-search v0.6.0 in **staging**
Closed, MigratedEdits Locked

Description

This new version come with a mapping change on the metadata so there are some actions to perform:

  • Stop the journal clients and swh-search
  • upgrade the packages
  • delete the origin index
  • Recreate the index with the new mapping
  • Restart swh-search service
  • Copy the backup of the index done in T2780
  • Restore the swh.search.journal_client consumer group offsets to P944
  • Reset the swh.search.journal_client.indexed consumer group offsets to the beginning
  • restart the service and the journal_client
  • wait for the backfill completion

Event Timeline

vsellier changed the task status from Open to Work in Progress.Feb 18 2021, 3:41 PM
vsellier triaged this task as Normal priority.
vsellier created this task.
vsellier moved this task from Backlog to in-progress on the System administration board.

stop the journal clients and swh-search

root@search0:~# puppet agent --disable "swh-search upgrade"
root@search0:~# systemctl stop swh-search-journal-client@objects.service 
root@search0:~# systemctl stop swh-search-journal-client@indexed.service
root@search0:~# systemctl stop gunicorn-swh-search.service

update the packages

root@search0:~# apt update && apt list --upgradable
...
python3-swh.search/unknown 0.6.0-1~swh1~bpo10+1 all [upgradable from: 0.5.0-1~swh1~bpo10+1]
...

root@search0:~# apt dist-upgrade
...
Preparing to unpack .../python3-swh.search_0.6.0-1~swh1~bpo10+1_all.deb ...
Unpacking python3-swh.search (0.6.0-1~swh1~bpo10+1) over (0.5.0-1~swh1~bpo10+1) ...
Setting up python3-swh.search (0.6.0-1~swh1~bpo10+1) ...

delete current index

  • Make a backup before
    • index
vsellier@search-esnode0 ~ % export NEW_INDEX=origin-v0.5.0
vsellier@search-esnode0 ~ % curl -XPUT http://$ES_SERVER/${NEW_INDEX}
{"acknowledged":true,"shards_acknowledged":true,"index":"origin-v0.5.0"}
vsellier@search-esnode0 ~ % curl http://${ES_SERVER}/origin/_mapping\?pretty | jq '.origin.mappings' > /tmp/mapping.json 
vsellier@search-esnode0 ~ % curl -XPUT -H "Content-Type: application/json" http://${ES_SERVER}/${NEW_INDEX}/_mapping -d @/tmp/mapping.json
{"acknowledged":true}%  

vsellier@search-esnode0 ~ % cat >reindex-origin.json <<EOF
{
  "source": {
    "index": "origin"
  },
  "dest": {
    "index": "${NEW_INDEX}"
  }
}
EOF
vsellier@search-esnode0 ~ % curl -XPOST -H "Content-Type: application/json" http://${ES_SERVER}/_reindex\?pretty\&refresh=true\&requests_per_second=-1\&\&wait_for_completion=true -d @reindex-origin.json   
{
  "took" : 246426,
  "timed_out" : false,
  "total" : 503339,
  "updated" : 0,
  "created" : 503339,
  "deleted" : 0,
  "batches" : 504,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}
vsellier@search-esnode0 ~ % curl  -s http://${ES_SERVER}/_cat/indices\?v                                                                                                                                  
health status index                       uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin                      xBl67YKsQbWAt7V78UeDLA  80   0     868121        82710        1gb            1gb
green  open   origin-backup-20210209-1736 P1CKjXW0QiWM5zlzX46-fg  80   0     496619            0    156.6mb        156.6mb
green  open   origin-v0.5.0               SGplSaqPR_O9cPYU4ZsmdQ  80   0     868121            0    987.7mb        987.7mb
  • kafka offsets
vsellier@journal0 ~ % /opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server $SERVER --reset-offsets --all-topics --to-current --dry-run  --export --group swh.search.journal_client 2>&1 > journal_client_offsets.csv

Values stored on P955

  • delete the index
vsellier@search-esnode0 ~ % curl -s -XDELETE http://${ES_SERVER}/origin                    
{"acknowledged":true}%

Recreate the index with the new mapping

swhstorage@search0:~$ swh search --config-file=/etc/softwareheritage/search/journal_client_objects.yml initialize
INFO:elasticsearch:PUT http://search-esnode0.internal.staging.swh.network:9200/origin [status:200 request:3.136s]
INFO:elasticsearch:PUT http://search-esnode0.internal.staging.swh.network:9200/origin/_mapping [status:200 request:0.036s]
Done.
vsellier@search-esnode0 ~ % curl -s -H "Content-Type: application/json" http://${ES_SERVER}/origin/_mapping\?pretty | grep date
      "date_detection" : false,

restart swh-search service

root@search0:~# systemctl start gunicorn-swh-search.service

Copy the backup of the index done in T2780

vsellier@search-esnode0 ~ % cat >reindex-origin.json <<EOF
{
  "source": {
    "index": "origin-backup-20210209-1736"
  },
  "dest": {
    "index": "origin"
  }
}
EOF

vsellier@search-esnode0 ~ % curl -XPOST -H "Content-Type: application/json" http://${ES_SERVER}/_reindex\?pretty\&refresh=true\&requests_per_second=-1\&\&wait_for_completion=true -d @reindex-origin.json
{
  "took" : 134042,
  "timed_out" : false,
  "total" : 496619,
  "updated" : 0,
  "created" : 496619,
  "deleted" : 0,
  "batches" : 497,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

vsellier@search-esnode0 ~ % curl  -s http://${ES_SERVER}/_cat/indices\?v                                                                                                                                  
health status index                       uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin                      HthJj42xT5uO7w3Aoxzppw  80   0     496619            0    329.5mb        329.5mb
green  open   origin-backup-20210209-1736 P1CKjXW0QiWM5zlzX46-fg  80   0     496619            0    156.6mb        156.6mb
green  open   origin-v0.5.0               SGplSaqPR_O9cPYU4ZsmdQ  80   0     868121            0    987.7mb        987.7mb

Restore the swh.search.journal_client consumer group offsets to P944

vsellier@journal0 ~ % /opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server $SERVER --reset-offsets --all-topics  --from-file offsets.csv --group swh.search.journal_client --execute

GROUP                          TOPIC                          PARTITION  NEW-OFFSET     
swh.search.journal_client      swh.journal.objects.origin_visit_status 26         335718         
swh.search.journal_client      swh.journal.objects.origin_visit_status 12         336502         
swh.search.journal_client      swh.journal.objects.origin_visit_status 35         335346         
swh.search.journal_client      swh.journal.objects.origin     54         8082           
swh.search.journal_client      swh.journal.objects.origin_visit 55         169851         
...

Reset the swh.search.journal_client.indexed consumer group offsets to the beginning

vsellier@journal0 ~ % /opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server $SERVER --reset-offsets --all-topics --to-earliest --group swh.search.journal_client.indexed --execute

GROUP                          TOPIC                          PARTITION  NEW-OFFSET     
swh.search.journal_client.indexed swh.journal.indexed.origin_intrinsic_metadata 0          13598025       
vsellier@journal0 ~ % /opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server $SERVER --describe --group swh.search.journal_client.indexed                                          

Consumer group 'swh.search.journal_client.indexed' has no active members.

GROUP                             TOPIC                                         PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID     HOST            CLIENT-ID
swh.search.journal_client.indexed swh.journal.indexed.origin_intrinsic_metadata 0          13598025        15051694        1453669         -               -               -

restart the service and the journal_client

root@search0:~# systemctl start swh-search-journal-client@objects.service
root@search0:~# systemctl start swh-search-journal-client@indexed.service

The journal clients recovered, so the index is up-to-date.
Let's check some point before closing :

  • The index size looks huge (~10g) compared to before the deployment
  • it seems some document have no origin_visit_type populated as they should :
swh=> select * from origin where url='deb://Debian/packages/node-response-time';
  id   |                   url                    
-------+------------------------------------------
 15552 | deb://Debian/packages/node-response-time
(1 row)

swh=> select * from origin_visit where origin=15552 limit 1;
 origin | visit |             date              | type 
--------+-------+-------------------------------+------
  15552 |     1 | 2020-11-03 06:16:19.962182+00 | deb
{
  "_index": "origin",
  "_type": "_doc",
  "_id": "17e7984da6467e6b56e7c7caff01821a8143bb58",
  "_version": 1,
  "_seq_no": 1783,
  "_primary_term": 1,
  "found": true,
  "_source": {
    "url": "deb://Debian/packages/node-response-time",
    "sha1": "17e7984da6467e6b56e7c7caff01821a8143bb58",
    "has_visits": true
  }
}

Regarding the index size, it seems it's due to a huge number of deleted documents (probably due to the backlog and an update of the documents at each change)

% curl  -s http://${ES_SERVER}/_cat/indices\?v                                                       
health status index                       uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin                      HthJj42xT5uO7w3Aoxzppw  80   0     868634      8577610     10.5gb         10.5gb
green  close  origin-backup-20210209-1736 P1CKjXW0QiWM5zlzX46-fg  80   0                                                  
green  open   origin-v0.5.0               SGplSaqPR_O9cPYU4ZsmdQ  80   0     868121            0    987.7mb        987.7mb
green  open   origin-toremove             PL7WEs3FTJSQy4dgGIwpeQ  80   0     868610            0    987.5mb        987.5mb  <-- A clean copy of the origin index has almose the same size as yesterday

Forcing a merge seems restore a decent size :

% curl -XPOST -H "Content-Type: application/json" http://${ES_SERVER}/origin/_forcemerge                           
{"_shards":{"total":80,"successful":80,"failed":0}}%
% curl  -s http://${ES_SERVER}/_cat/indices\?v      
health status index                       uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin                      HthJj42xT5uO7w3Aoxzppw  80   0     868684         3454        1gb            1gb
green  close  origin-backup-20210209-1736 P1CKjXW0QiWM5zlzX46-fg  80   0                                                  
green  open   origin-v0.5.0               SGplSaqPR_O9cPYU4ZsmdQ  80   0     868121            0    987.7mb        987.7mb
green  open   origin-toremove             PL7WEs3FTJSQy4dgGIwpeQ  80   0     868610            0    987.5mb        987.5mb

It will be probably something to schedule regularly on production index if size matters

Regarding the missing visit_type, one of the topic with the visit_type needs to be visited again to populate the fields for all the origins.
As the index was restored from the backup, the fields was only set for the visits done since the last 15days.
The offset will be reset for the origin_visit to limit the work.

  • stop the journal client
root@search0:~# systemctl stop swh-search-journal-client@objects.service 
root@search0:~# puppet agent --disable "stop search journal client to reset offsets"
  • reset the offset for the swh.journal.objects.origin_visit topic:
vsellier@journal0 ~ % /opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server $SERVER --reset-offsets --topic swh.journal.objects.origin_visit --to-earliest --group swh.search.journal_client --execute

GROUP                          TOPIC                          PARTITION  NEW-OFFSET     
swh.search.journal_client      swh.journal.objects.origin_visit 16         0              
swh.search.journal_client      swh.journal.objects.origin_visit 10         0              
...
  • restart the journal client
root@search0:~# puppet agent --enable
root@search0:~# systemctl start swh-search-journal-client@objects.service

The backlog recovering is in progress

Comment discarded (unrelated to this task) and reported in a dedicated task [1]

[1] T3067#59295

vsellier closed this task as Resolved.EditedMar 1 2021, 10:55 AM

the backfill is done, the search on metadata seems to work correctly.

The index statistics:

 % curl  -s http://${ES_SERVER}/_cat/indices\?v 
health status index                       uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin                      HthJj42xT5uO7w3Aoxzppw  80   0     907324      1972532        8gb            8gb