- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jan 27 2021
Thanks :)
Use an exception to validate a repo page can be accessed
rebase
Restore missing log when the date can't be parsed
rebase
Remove useless variable
- Reorder methods
- Adapt date parsing according the review
To decrease the time to recover the lag, several journal client were launched in // with :
/usr/bin/swh search --config-file /etc/softwareheritage/search/journal_client_objects.yml journal-client objects
Jan 26 2021
Inline unecessary indirection
Add missing test coverage
Upgrading the index configuration to speedup the indexation :
% cat >/tmp/config.json <<EOF { "index" : { "translog.sync_interval" : "60s", "translog.durability": "async", "refresh_interval": "60s" } } EOF % export ES_SERVER=192.168.100.81:9200 % export INDEX=origin % curl -s -H "Content-Type: application/json" -XPUT http://${ES_SERVER}/${INDEX}/_settings -d @/tmp/config.json {"acknowledged":true}%
Production
- puppet disabled
- Services stopped :
root@search1:~# systemctl stop swh-search-journal-client@objects.service root@search1:~# systemctl stop gunicorn-swh-search
- Index deleted and recreated
% export ES_SERVER=search-esnode1.internal.softwareheritage.org:9200 % curl -s http://$ES_SERVER/_cat/indices\?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open origin Mq8dnlpuRXO4yYoC6CTuQw 90 1 151716299 38861934 260.8gb 131gb % curl -XDELETE http://$ES_SERVER/origin {"acknowledged":true}% % swh search --config-file /etc/softwareheritage/search/server.yml initialize INFO:elasticsearch:PUT http://search-esnode1.internal.softwareheritage.org:9200/origin [status:200 request:2.216s] INFO:elasticsearch:PUT http://search-esnode3.internal.softwareheritage.org:9200/origin/_mapping [status:200 request:0.151s] Done. % curl -s http://$ES_SERVER/_cat/indices\?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open origin yFaqPPCnRFCnc5AA6Ah8lw 90 1 0 0 36.5kb 18.2kb
- journal client's consumer group delete:
% export SERVER=kafka1.internal.softwareheritage.org:9092 % ./kafka-consumer-groups.sh --bootstrap-server ${SERVER} --delete --group swh.search.journal_client Deletion of requested consumer groups ('swh.search.journal_client') was successful.
- journal client restarted
- puppet enabled
The filter on visited origins is working correctly on staging. The has_visit flag looks good.
For example for the https://www.npmjs.com/package/@ehmicky/dev-tasks origin
{ "_index" : "origin", "_type" : "_doc", "_id" : "019bd314416108304165e82dd92e00bc9ea85a53", "_score" : 60.56421, "_source" : { "url" : "https://www.npmjs.com/package/@ehmicky/dev-tasks", "sha1" : "019bd314416108304165e82dd92e00bc9ea85a53" }, "sort" : [ 60.56421, "019bd314416108304165e82dd92e00bc9ea85a53" ] }
swh=> select * from origin join origin_visit_status on id=origin where id=469380; id | url | origin | visit | date | status | metadata | snapshot | type --------+--------------------------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------ 469380 | https://www.npmjs.com/package/@ehmicky/dev-tasks | 469380 | 1 | 2021-01-25 13:30:47.221937+00 | created | | | npm 469380 | https://www.npmjs.com/package/@ehmicky/dev-tasks | 469380 | 1 | 2021-01-25 13:41:59.435579+00 | partial | | \xe3f24413d81fd3e9c309686fcfb6c8f5eb549acf | npm
Jan 25 2021
Staging
We are proceeding to a complete index rebuilding
Regarding the index rebuilding process, using a naive approach with aliases with the old and the new index[1] returns duplicated results when the search is done.
Using an alias with only the old index, rebuilding a new index and switching the alias to the new index[2] can be a first approach with the default the old index will not be updated until the alias is switched to the new index.
It also requires the swh-search code is able to use different names for the read and write operations.
LGTM
- rebase
- update tests according to the review feedbacks
It seems redis has a Hyperloglog functionnality[1] that can match with the requirements (bloom filter / limited deviation / small memory footprint / efficiency).
Jan 23 2021
Jan 22 2021
Jan 21 2021
rebase
rebase
Test with "several" database upserts as it's more realistic
split the long entrypoint's command line
rebase
With the longer warning threshold, the monitoring is now green.
Jan 20 2021
Backfill launched from storage1 with this script : P927 (10 ranges in //) and finished in ~15mn
All staging worker stopped:
root@pergamon:~# sudo clush -b -w @staging-workers 'puppet agent --disable "Deploy new storage version"; cd /etc/systemd/system/multi-user.target.wants; for unit in swh-worker@*; do systemctl disable $unit; done; systemctl stop swh-worker@*'
it seems it's the scheduler running that is taking time to scheduler the deposit task :
08:37:53 -> task is created
08:43:05 -> the runner is scheduling the task
08:43:24 -> the worker acknowledge the task