Page MenuHomeSoftware Heritage
Feed Advanced Search

Feb 18 2021

vlorentz closed T3058: Metadata search is failing with "failed to parse date field", a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Feb 18 2021, 12:13 PM · Journal, Archive search
vsellier moved T2944: Deploy swh-search v0.4.1 from deployed/landed/monitoring to done on the System administration board.
Feb 18 2021, 9:27 AM · System administration, Journal, Archive search

Feb 17 2021

anlambert closed T3047: Enable to search in origin metadata with swh-search in webapp, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Feb 17 2021, 12:12 PM · Journal, Archive search

Feb 15 2021

vsellier renamed T3040: [production] Enable swh-search's journal-client for indexed objects from Enable swh-search's journal-client for indexed objects to [production] Enable swh-search's journal-client for indexed objects.
Feb 15 2021, 10:02 AM · System administration, Journal, Archive search

Feb 12 2021

ardumont closed T3037: Reschedule origin-intrinsic-metadata tasks for all origins, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Feb 12 2021, 6:32 PM · Journal, Archive search
ardumont closed T3037: Reschedule origin-intrinsic-metadata tasks for all origins as Resolved.
Feb 12 2021, 6:32 PM · System administration, Journal, Archive search

Feb 11 2021

vsellier placed T3040: [production] Enable swh-search's journal-client for indexed objects up for grabs.
Feb 11 2021, 3:32 PM · System administration, Journal, Archive search
ardumont moved T3037: Reschedule origin-intrinsic-metadata tasks for all origins from in-progress to deployed/landed/monitoring on the System administration board.
Feb 11 2021, 3:22 PM · System administration, Journal, Archive search
ardumont added a comment to T3037: Reschedule origin-intrinsic-metadata tasks for all origins.

Done scheduling:

Feb 11 2021, 3:22 PM · System administration, Journal, Archive search
vsellier added a comment to T3040: [production] Enable swh-search's journal-client for indexed objects.

T3041 needs to be done before this one (for the production environment)

Feb 11 2021, 2:21 PM · System administration, Journal, Archive search
vlorentz triaged T3040: [production] Enable swh-search's journal-client for indexed objects as Normal priority.
Feb 11 2021, 1:17 PM · System administration, Journal, Archive search
ardumont added a comment to T3037: Reschedule origin-intrinsic-metadata tasks for all origins.

Running:

swhscheduler@saatchi:~$ /usr/bin/swh scheduler --config-file /etc/softwareheritage/scheduler/backend.yml task schedule_
origins --storage-url http://saam.internal.softwareheritage.org:5002 --batch-size 20 index-origin-metadata | tee /tmp/schedule-origins.txt
Feb 11 2021, 11:05 AM · System administration, Journal, Archive search
vlorentz changed the status of T2590: Finish the indexer -> swh-search pipeline from Open to Work in Progress.
Feb 11 2021, 11:01 AM · Journal, Archive search
ardumont added a comment to T3037: Reschedule origin-intrinsic-metadata tasks for all origins.

@ardumont no, OriginMetadataIndexer lacks a filter step.

Feb 11 2021, 10:47 AM · System administration, Journal, Archive search
vlorentz added a comment to T3037: Reschedule origin-intrinsic-metadata tasks for all origins.

@ardumont no, OriginMetadataIndexer lacks a filter step.

Feb 11 2021, 10:12 AM · System administration, Journal, Archive search
ardumont closed T2780: Enable the journal-writer for the swh-idx-storage in production, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Feb 11 2021, 9:49 AM · Journal, Archive search
ardumont closed T2780: Enable the journal-writer for the swh-idx-storage in production, a subtask of T3037: Reschedule origin-intrinsic-metadata tasks for all origins, as Resolved.
Feb 11 2021, 9:48 AM · System administration, Journal, Archive search
ardumont closed T2780: Enable the journal-writer for the swh-idx-storage in production as Resolved.
Feb 11 2021, 9:48 AM · System administration, Journal
ardumont changed the status of T3037: Reschedule origin-intrinsic-metadata tasks for all origins from Open to Work in Progress.
Feb 11 2021, 9:41 AM · System administration, Journal, Archive search
ardumont changed the status of T3037: Reschedule origin-intrinsic-metadata tasks for all origins, a subtask of T2590: Finish the indexer -> swh-search pipeline, from Open to Work in Progress.
Feb 11 2021, 9:41 AM · Journal, Archive search
ardumont added a project to T3037: Reschedule origin-intrinsic-metadata tasks for all origins: System administration.
Feb 11 2021, 9:41 AM · System administration, Journal, Archive search
ardumont added a comment to T3037: Reschedule origin-intrinsic-metadata tasks for all origins.

Although, now i'm wondering something.
Is that enough to write what's not in the topics?

Feb 11 2021, 9:37 AM · System administration, Journal, Archive search
ardumont added a comment to T3037: Reschedule origin-intrinsic-metadata tasks for all origins.

Ah no! I misused the cli, with the right flags:

Feb 11 2021, 9:32 AM · System administration, Journal, Archive search
ardumont added a comment to T3037: Reschedule origin-intrinsic-metadata tasks for all origins.

This needs a storage access so edit a dedicated configuration file.

Feb 11 2021, 9:25 AM · System administration, Journal, Archive search
ardumont added a comment to T3037: Reschedule origin-intrinsic-metadata tasks for all origins.

That's it! [1]

Feb 11 2021, 8:40 AM · System administration, Journal, Archive search

Feb 10 2021

vlorentz added a comment to T3037: Reschedule origin-intrinsic-metadata tasks for all origins.

try swh scheduler task schedule_origins

Feb 10 2021, 11:58 PM · System administration, Journal, Archive search
ardumont added a comment to T3037: Reschedule origin-intrinsic-metadata tasks for all origins.

That suggested cli does not show up but i've only took a quick glance ¯\_(ツ)_/¯:

Feb 10 2021, 7:17 PM · System administration, Journal, Archive search
ardumont added a comment to T3037: Reschedule origin-intrinsic-metadata tasks for all origins.

Rather than write code to read from the database to kafka (like we did with swh-storage), this can be done simply by re-indexing all the origins, using swh scheduler schedule_origins

Feb 10 2021, 6:23 PM · System administration, Journal, Archive search
vlorentz added a parent task for T2780: Enable the journal-writer for the swh-idx-storage in production: T3037: Reschedule origin-intrinsic-metadata tasks for all origins.
Feb 10 2021, 5:14 PM · System administration, Journal
vlorentz added a subtask for T3037: Reschedule origin-intrinsic-metadata tasks for all origins: T2780: Enable the journal-writer for the swh-idx-storage in production.
Feb 10 2021, 5:14 PM · System administration, Journal, Archive search
vlorentz triaged T3037: Reschedule origin-intrinsic-metadata tasks for all origins as Normal priority.
Feb 10 2021, 5:14 PM · System administration, Journal, Archive search
ardumont added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.

Indexer related topics status can be seen in in the indexer ingestion status board [1]

Feb 10 2021, 3:39 PM · System administration, Journal
ardumont moved T2780: Enable the journal-writer for the swh-idx-storage in production from in-progress to deployed/landed/monitoring on the System administration board.
Feb 10 2021, 3:35 PM · System administration, Journal
ardumont added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.

We'll prepare the topics with the following first and we'll improve later if need be:

Feb 10 2021, 1:15 PM · System administration, Journal
ardumont added a revision to T2780: Enable the journal-writer for the swh-idx-storage in production: D5056: staging: Dedicate an indexer worker.
Feb 10 2021, 12:34 PM · System administration, Journal
ardumont added a revision to T2780: Enable the journal-writer for the swh-idx-storage in production: D5055: staging: Dedicate an indexer worker.
Feb 10 2021, 12:30 PM · System administration, Journal
ardumont added a revision to T2780: Enable the journal-writer for the swh-idx-storage in production: D5054: Enable the journal-writer for the swh-idx-storage in production.
Feb 10 2021, 11:41 AM · System administration, Journal
ardumont added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.

I did too much here. Finish the pipeline swh-indexer -> swh-search on staging (so that's good nonetheless)

Feb 10 2021, 11:09 AM · System administration, Journal
ardumont added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.

Note that the "docs.count" grew though (from 496619 to 786803) and the reason are
unclear.

The same index is used to store the metadata out of the indexer with the same origin url
as key [1] and we are computing index metadata on origins already seen (thus already present
in the index afaiui). So I would have expect the docs.count stay roughly (or even
exactly?) the same as before?

Feb 10 2021, 10:26 AM · System administration, Journal
ardumont added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.

swh-search-journal-client@indexed kept up with its topic:

swh.search.journal_client.indexed swh.journal.indexed.origin_intrinsic_metadata 0          13653216        13653216        0               rdkafka-7c45245c-814f-47f1-ba67-041e4f426373 /192.168.130.90 rdkafka
Feb 10 2021, 9:02 AM · System administration, Journal

Feb 9 2021

ardumont added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.

tl; dr deployed on staging and it seems ok.

Feb 9 2021, 7:04 PM · System administration, Journal
ardumont added a revision to T2780: Enable the journal-writer for the swh-idx-storage in production: D5053: staging: Activate swh-search-journal-client@indexed.
Feb 9 2021, 5:59 PM · System administration, Journal
ardumont added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.

I just mention some on T2912#58067 but it's unclear whether that's actually true of me misremembering things.

Feb 9 2021, 5:53 PM · System administration, Journal
ardumont changed the status of T2780: Enable the journal-writer for the swh-idx-storage in production, a subtask of T2590: Finish the indexer -> swh-search pipeline, from Open to Work in Progress.
Feb 9 2021, 5:51 PM · Journal, Archive search
ardumont changed the status of T2780: Enable the journal-writer for the swh-idx-storage in production from Open to Work in Progress.
Feb 9 2021, 5:51 PM · System administration, Journal
vlorentz assigned T2780: Enable the journal-writer for the swh-idx-storage in production to ardumont.
Feb 9 2021, 3:30 PM · System administration, Journal
vlorentz closed T2936: Update the swh-search journal client to only set "has_visit" on "full" status of the visit, a subtask of T2905: Deploy swh-search for production, as Resolved.
Feb 9 2021, 3:28 PM · System administration, Journal, Archive search
vlorentz closed T2936: Update the swh-search journal client to only set "has_visit" on "full" status of the visit as Resolved.
Feb 9 2021, 3:28 PM · Journal, Archive search

Feb 5 2021

ardumont moved T2780: Enable the journal-writer for the swh-idx-storage in production from Backlog to Weekly backlog on the System administration board.
Feb 5 2021, 7:26 PM · System administration, Journal

Feb 4 2021

vlorentz merged task T3012: Check all objects in the production storage/journal have a correct hash into T75: Check integrity of directories, revisions, and releases.
Feb 4 2021, 6:13 PM · Journal, Storage manager
olasd added a comment to T3012: Check all objects in the production storage/journal have a correct hash.

This is a duplicate of T75, the history of which would probably be useful to take into account (I suspect it can be closed).

Feb 4 2021, 6:11 PM · Journal, Storage manager

Feb 3 2021

ardumont added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.

Is there some remaining blocker on this?
(If not i'll attend to it next week)

Feb 3 2021, 4:35 PM · System administration, Journal

Feb 2 2021

seirl triaged T3021: Investigate why reading the journal of the content table takes so long as Normal priority.
Feb 2 2021, 2:00 PM · Journal, Datasets

Feb 1 2021

vlorentz triaged T3012: Check all objects in the production storage/journal have a correct hash as Normal priority.
Feb 1 2021, 12:38 PM · Journal, Storage manager
vsellier closed T2944: Deploy swh-search v0.4.1, a subtask of T2936: Update the swh-search journal client to only set "has_visit" on "full" status of the visit, as Resolved.
Feb 1 2021, 10:06 AM · Journal, Archive search
vsellier closed T2944: Deploy swh-search v0.4.1 as Resolved.

The backfill is done.

Feb 1 2021, 10:06 AM · System administration, Journal, Archive search

Jan 29 2021

vsellier added a comment to T2944: Deploy swh-search v0.4.1.

The journal_client has almost ingested the topics[1] it listens. It took some more time because a backfill of the origin_visit_status was launched for T2993.
It should be done by the end of the day.

Jan 29 2021, 2:44 PM · System administration, Journal, Archive search
vsellier moved T2905: Deploy swh-search for production from deployed/landed/monitoring to done on the System administration board.
Jan 29 2021, 12:21 PM · System administration, Journal, Archive search

Jan 27 2021

douardda closed T2970: Make swh-journal tests not depend on swh-model any more as Resolved.

Let's consider it as done.

Jan 27 2021, 3:47 PM · Journal
vsellier moved T2944: Deploy swh-search v0.4.1 from in-progress to deployed/landed/monitoring on the System administration board.
Jan 27 2021, 12:44 PM · System administration, Journal, Archive search
vsellier added a comment to T2944: Deploy swh-search v0.4.1.

To decrease the time to recover the lag, several journal client were launched in // with :

/usr/bin/swh search --config-file /etc/softwareheritage/search/journal_client_objects.yml journal-client objects
Jan 27 2021, 10:00 AM · System administration, Journal, Archive search

Jan 26 2021

douardda added a revision to T2970: Make swh-journal tests not depend on swh-model any more: D4951: Remove tests' journal_data.py in favor of the version in swh-model.
Jan 26 2021, 5:09 PM · Journal
douardda added a revision to T2970: Make swh-journal tests not depend on swh-model any more: D4950: Add swh-journal's model-related test data set in swh-model.
Jan 26 2021, 4:45 PM · Journal
douardda added a comment to T2970: Make swh-journal tests not depend on swh-model any more.

Back on this, the plan is now to make swh-journal not depend on the actual model definition, which is currently mostly due to the presence of the journal_data.py in swh-journal. So the plan is to move this file in swh-model so it's kept up to date with swh-model, even if it's mostly used for testing other packages (like swh-journal).

Jan 26 2021, 4:41 PM · Journal
vsellier added a comment to T2944: Deploy swh-search v0.4.1.

Upgrading the index configuration to speedup the indexation :

% cat >/tmp/config.json <<EOF
{
  "index" : {
"translog.sync_interval" : "60s",
"translog.durability": "async",
"refresh_interval": "60s"
  }
}
EOF
% export ES_SERVER=192.168.100.81:9200
% export INDEX=origin            
% curl -s -H "Content-Type: application/json" -XPUT http://${ES_SERVER}/${INDEX}/_settings -d @/tmp/config.json 
{"acknowledged":true}%
Jan 26 2021, 10:31 AM · System administration, Journal, Archive search
vsellier added a comment to T2944: Deploy swh-search v0.4.1.

Production

  • puppet disabled
  • Services stopped :
root@search1:~# systemctl stop swh-search-journal-client@objects.service 
root@search1:~# systemctl stop gunicorn-swh-search
  • Index deleted and recreated
% export ES_SERVER=search-esnode1.internal.softwareheritage.org:9200
% curl -s http://$ES_SERVER/_cat/indices\?v 
health status index  uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin Mq8dnlpuRXO4yYoC6CTuQw  90   1  151716299     38861934    260.8gb          131gb
% curl -XDELETE http://$ES_SERVER/origin
{"acknowledged":true}%    
% swh search --config-file /etc/softwareheritage/search/server.yml  initialize
INFO:elasticsearch:PUT http://search-esnode1.internal.softwareheritage.org:9200/origin [status:200 request:2.216s]
INFO:elasticsearch:PUT http://search-esnode3.internal.softwareheritage.org:9200/origin/_mapping [status:200 request:0.151s]
Done.
% curl -s http://$ES_SERVER/_cat/indices\?v                                        
health status index  uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin yFaqPPCnRFCnc5AA6Ah8lw  90   1          0            0     36.5kb         18.2kb
  • journal client's consumer group delete:
% export SERVER=kafka1.internal.softwareheritage.org:9092  
% ./kafka-consumer-groups.sh --bootstrap-server ${SERVER} --delete --group swh.search.journal_client
Deletion of requested consumer groups ('swh.search.journal_client') was successful.
  • journal client restarted
  • puppet enabled
Jan 26 2021, 9:39 AM · System administration, Journal, Archive search
vsellier added a comment to T2944: Deploy swh-search v0.4.1.

The filter on visited origins is working correctly on staging. The has_visit flag looks good.
For example for the https://www.npmjs.com/package/@ehmicky/dev-tasks origin

{
  "_index" : "origin",
  "_type" : "_doc",
  "_id" : "019bd314416108304165e82dd92e00bc9ea85a53",
  "_score" : 60.56421,
  "_source" : {
    "url" : "https://www.npmjs.com/package/@ehmicky/dev-tasks",
    "sha1" : "019bd314416108304165e82dd92e00bc9ea85a53"
  },
  "sort" : [
    60.56421,
    "019bd314416108304165e82dd92e00bc9ea85a53"
  ]
}
swh=> select * from origin join origin_visit_status on id=origin where id=469380;
   id   |                       url                        | origin | visit |             date              | status  | metadata |                  snapshot                  | type 
--------+--------------------------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------
 469380 | https://www.npmjs.com/package/@ehmicky/dev-tasks | 469380 |     1 | 2021-01-25 13:30:47.221937+00 | created |          |                                            | npm
 469380 | https://www.npmjs.com/package/@ehmicky/dev-tasks | 469380 |     1 | 2021-01-25 13:41:59.435579+00 | partial |          | \xe3f24413d81fd3e9c309686fcfb6c8f5eb549acf | npm
Jan 26 2021, 9:16 AM · System administration, Journal, Archive search

Jan 25 2021

vsellier added a comment to T2944: Deploy swh-search v0.4.1.

Staging

We are proceeding to a complete index rebuilding

Jan 25 2021, 5:44 PM · System administration, Journal, Archive search
vsellier added a comment to T2944: Deploy swh-search v0.4.1.

Regarding the index rebuilding process, using a naive approach with aliases with the old and the new index[1] returns duplicated results when the search is done.
Using an alias with only the old index, rebuilding a new index and switching the alias to the new index[2] can be a first approach with the default the old index will not be updated until the alias is switched to the new index.
It also requires the swh-search code is able to use different names for the read and write operations.

Jan 25 2021, 4:07 PM · System administration, Journal, Archive search
vsellier changed the status of T2944: Deploy swh-search v0.4.1, a subtask of T2936: Update the swh-search journal client to only set "has_visit" on "full" status of the visit, from Open to Work in Progress.
Jan 25 2021, 3:32 PM · Journal, Archive search
vsellier changed the status of T2944: Deploy swh-search v0.4.1 from Open to Work in Progress.
Jan 25 2021, 3:32 PM · System administration, Journal, Archive search
vsellier renamed T2944: Deploy swh-search v0.4.1 from Deploy swh-search v0.4.1 in staging to Deploy swh-search v0.4.1.
Jan 25 2021, 3:32 PM · System administration, Journal, Archive search

Jan 21 2021

vsellier moved T2905: Deploy swh-search for production from in-progress to deployed/landed/monitoring on the System administration board.
Jan 21 2021, 9:46 AM · System administration, Journal, Archive search

Jan 13 2021

douardda triaged T2970: Make swh-journal tests not depend on swh-model any more as Normal priority.
Jan 13 2021, 4:48 PM · Journal
douardda created T2970: Make swh-journal tests not depend on swh-model any more.
Jan 13 2021, 4:48 PM · Journal
vsellier closed T2905: Deploy swh-search for production, a subtask of T2904: Create a new production webapp using the frozen index on the staging ES, as Resolved.
Jan 13 2021, 9:23 AM · System administrators, Journal, Archive search
vsellier closed T2905: Deploy swh-search for production as Resolved.

I close this issue as there is not more action to perform at the moment.
Diagnosis and eventual fixes will be followed on dedicated issues

Jan 13 2021, 9:23 AM · System administration, Journal, Archive search

Jan 11 2021

vlorentz reassigned T2944: Deploy swh-search v0.4.1 from vlorentz to vsellier.
Jan 11 2021, 1:48 PM · System administration, Journal, Archive search
ardumont renamed T2944: Deploy swh-search v0.4.1 from Deploy T2936 in staging to Deploy swh-search v0.4.1 in staging.
Jan 11 2021, 10:50 AM · System administration, Journal, Archive search
ardumont moved T2944: Deploy swh-search v0.4.1 from Backlog to Weekly backlog on the System administration board.
Jan 11 2021, 10:49 AM · System administration, Journal, Archive search

Jan 7 2021

vsellier triaged T2944: Deploy swh-search v0.4.1 as Normal priority.
Jan 7 2021, 6:39 PM · System administration, Journal, Archive search
vsellier added a comment to T2936: Update the swh-search journal client to only set "has_visit" on "full" status of the visit.

version v0.4.1 created with the last commit (rDSEA47db624364d4e781f8fa157b2d72d0eb9929b7a0)

Jan 7 2021, 4:16 PM · Journal, Archive search
vlorentz renamed T2936: Update the swh-search journal client to only set "has_visit" on "full" status of the visit from Update the swh-search journal client to use visit statuses to Update the swh-search journal client to only set "has_visit" on "full" status of the visit.
Jan 7 2021, 1:18 PM · Journal, Archive search
vlorentz added a revision to T2936: Update the swh-search journal client to only set "has_visit" on "full" status of the visit: D4818: Do not set 'has_visit' when receiving a visit from the journal.
Jan 7 2021, 1:18 PM · Journal, Archive search
vlorentz added a comment to T2905: Deploy swh-search for production.

Oh right, they were wrongfully set to True. I guess we can write a small script to set them all to False before we re-consume stasuses

Jan 7 2021, 12:24 PM · System administration, Journal, Archive search
vlorentz added a comment to T2905: Deploy swh-search for production.

how doing it without killing all the search

Jan 7 2021, 12:19 PM · System administration, Journal, Archive search
vsellier added a comment to T2905: Deploy swh-search for production.

It depends of what will be implemented in T2936, but a new reindex will probably have to be done to fix the search. It will be the opportunity to think on how doing it without killing all the search

Jan 7 2021, 11:36 AM · System administration, Journal, Archive search
vlorentz triaged T2936: Update the swh-search journal client to only set "has_visit" on "full" status of the visit as Normal priority.
Jan 7 2021, 10:57 AM · Journal, Archive search
vlorentz added a comment to T2905: Deploy swh-search for production.

Yes indeed. swh-search was written before we had origin visit statuses, and I forgot to update it.

Jan 7 2021, 10:56 AM · System administration, Journal, Archive search
vsellier updated subscribers of T2905: Deploy swh-search for production.

@vlorentz I was checking some differences between swh-search and the current search. does the journal client has to listen the origin_visit topic? It seems that `origin_visit_status should be enough to match the behavior of the search in the webapp.

Jan 7 2021, 10:14 AM · System administration, Journal, Archive search

Jan 6 2021

olasd added a parent task for T2905: Deploy swh-search for production: T2182: Switch production swh-web to use swh-search instead of postgresql search..
Jan 6 2021, 11:15 AM · System administration, Journal, Archive search
vsellier updated the task description for T2905: Deploy swh-search for production.
Jan 6 2021, 11:06 AM · System administration, Journal, Archive search
vsellier added a comment to T2905: Deploy swh-search for production.

webapp1 is now plugged on the real live production index
Let monitor the behavior with real searches.
First constatation, the search retrieves all the documents and is not as progressive as the random search script.
The response times are longer than expected:

Jan 06 09:59:46 search1 python3[813]: 2021-01-06 09:59:46 [813] elasticsearch:INFO GET http://search-esnode1.internal.softwareheritage.org:9200/origin/_search?size=100 [status:200 request:3.399s]
Jan 06 10:06:18 search1 python3[848]: 2021-01-06 10:06:18 [848] elasticsearch:INFO GET http://search-esnode1.internal.softwareheritage.org:9200/origin/_search?size=100 [status:200 request:7.422s]
Jan 06 10:06:21 search1 python3[813]: 2021-01-06 10:06:21 [813] elasticsearch:INFO GET http://search-esnode3.internal.softwareheritage.org:9200/origin/_search?size=100 [status:200 request:5.077s]
Jan 06 10:07:32 search1 python3[813]: 2021-01-06 10:07:32 [813] elasticsearch:INFO GET http://search-esnode2.internal.softwareheritage.org:9200/origin/_search?size=100 [status:200 request:4.819s]
Jan 06 10:08:06 search1 python3[813]: 2021-01-06 10:08:06 [813] elasticsearch:INFO GET http://search-esnode1.internal.softwareheritage.org:9200/origin/_search?size=100 [status:200 request:2.700s]
Jan 06 10:08:15 search1 python3[813]: 2021-01-06 10:08:15 [813] elasticsearch:INFO GET http://search-esnode3.internal.softwareheritage.org:9200/origin/_search?size=100 [status:200 request:2.414s]
Jan 6 2021, 11:01 AM · System administration, Journal, Archive search
vsellier added a comment to T2905: Deploy swh-search for production.

the performances looks acceptable as it for a small number of parallel searches (~10), let's try now with real searches, it will also help to adapt the cluster configuration and validate the behavior

Jan 6 2021, 9:59 AM · System administration, Journal, Archive search
vsellier updated the task description for T2905: Deploy swh-search for production.
Jan 6 2021, 9:56 AM · System administration, Journal, Archive search
vsellier added a revision to T2905: Deploy swh-search for production: D4809: Plug webapp1 on the swh-search with live production data.
Jan 6 2021, 9:55 AM · System administration, Journal, Archive search

Jan 5 2021

vsellier moved T2905: Deploy swh-search for production from Backlog to in-progress on the System administration board.
Jan 5 2021, 2:39 PM · System administration, Journal, Archive search
vsellier updated the task description for T2905: Deploy swh-search for production.
Jan 5 2021, 2:37 PM · System administration, Journal, Archive search
vsellier added a comment to T2905: Deploy swh-search for production.

In the new configuration, after a few time without search, the first ones are taking some time before stabilizing to the old values :

❯ ./random_search.sh                                                                                        12:36:37
Jan 5 2021, 2:16 PM · System administration, Journal, Archive search
vsellier added a comment to T2905: Deploy swh-search for production.

the index configuration was reset to its default :

cat >/tmp/config.json <<EOF
{
  "index" : {
"translog.sync_interval" : null,
"translog.durability": null,
"refresh_interval": null
  }
}
EOF
❯ curl -s http://192.168.100.81:9200/origin/_settings\?pretty
{
  "origin" : {
    "settings" : {
      "index" : {
        "refresh_interval" : "60s",
        "number_of_shards" : "90",
        "translog" : {
          "sync_interval" : "60s",
          "durability" : "async"
        },
        "provided_name" : "origin",
        "creation_date" : "1608761881782",
        "number_of_replicas" : "1",
        "uuid" : "Mq8dnlpuRXO4yYoC6CTuQw",
        "version" : {
          "created" : "7090399"
        }
      }
    }
  }
}
❯ curl -s -H "Content-Type: application/json" -XPUT http://192.168.100.81:9200/origin/_settings\?pretty -d @/tmp/config.json
{
  "acknowledged" : true
}
❯ curl -s http://192.168.100.81:9200/origin/_settings\?pretty
{
  "origin" : {
    "settings" : {
      "index" : {
        "creation_date" : "1608761881782",
        "number_of_shards" : "90",
        "number_of_replicas" : "1",
        "uuid" : "Mq8dnlpuRXO4yYoC6CTuQw",
        "version" : {
          "created" : "7090399"
        },
        "provided_name" : "origin"
      }
    }
  }
}

A *simple* search doesn't looked impacted (it's not a real benchmark):

❯ ./random_search.sh
Jan 5 2021, 9:47 AM · System administration, Journal, Archive search