Page MenuHomeSoftware Heritage
Feed Advanced Search

Mar 5 2021

vsellier added a revision to T3083: Deploy swh-search v0.7.0/v0.7.1: D5198: swh-search: add indexes configuration.
Mar 5 2021, 9:19 AM · System administration, Journal, Archive search

Mar 4 2021

vsellier added a comment to T3083: Deploy swh-search v0.7.0/v0.7.1.

swh-search:v0.7.1 deployed in staging according to the defined plan.
The aliases are well created and used by the services

vsellier@search-esnode0 ~ % curl -XGET -H "Content-Type: application/json" http://192.168.130.80:9200/_cat/indices
green open  origin                      HthJj42xT5uO7w3Aoxzppw 80 0 929692 137147 4gb 4gb
green close origin-backup-20210209-1736 P1CKjXW0QiWM5zlzX46-fg 80 0                      
green close origin-v0.5.0               SGplSaqPR_O9cPYU4ZsmdQ 80 0                      
vsellier@search-esnode0 ~ % curl -XGET -H "Content-Type: application/json" http://192.168.130.80:9200/_cat/aliases
origin-read  origin - - - -
origin-write origin - - - -

Journal clients:

Mar 04 16:22:40 search0 swh[3598137]: INFO:elasticsearch:POST http://search-esnode0.internal.staging.swh.network:9200/origin-write/_bulk [status:200 request:0.013s]
Mar 04 16:22:41 search0 swh[3598137]: INFO:elasticsearch:POST http://search-esnode0.internal.staging.swh.network:9200/origin-write/_bulk [status:200 request:0.012s]

Search:

Mar 04 15:40:20 search0 python3[3598040]: 2021-03-04 15:40:20 [3598040] swh.search.api.server:INFO Initializing indexes with configuration: 
Mar 04 15:40:20 search0 python3[3598040]: 2021-03-04 15:40:20 [3598040] elasticsearch:INFO HEAD http://search-esnode0.internal.staging.swh.network:9200/origin [status:200 request:0.005s]
Mar 04 15:40:20 search0 python3[3598040]: 2021-03-04 15:40:20 [3598040] elasticsearch:INFO HEAD http://search-esnode0.internal.staging.swh.network:9200/origin-read/_alias [status:200 request:0.001s]
Mar 04 15:40:20 search0 python3[3598040]: 2021-03-04 15:40:20 [3598040] elasticsearch:INFO HEAD http://search-esnode0.internal.staging.swh.network:9200/origin-write/_alias [status:200 request:0.001s]
Mar 04 15:40:20 search0 python3[3598040]: 2021-03-04 15:40:20 [3598040] elasticsearch:INFO PUT http://search-esnode0.internal.staging.swh.network:9200/origin/_mapping [status:200 request:0.006s]
Mar 04 16:19:27 search0 python3[3598042]: 2021-03-04 16:19:27 [3598042] elasticsearch:INFO GET http://search-esnode0.internal.staging.swh.network:9200/origin-read/_search?size=100 [status:200 request:0.076s]
Mar 4 2021, 5:24 PM · System administration, Journal, Archive search
vsellier renamed T3083: Deploy swh-search v0.7.0/v0.7.1 from Deploy swh-search v0.7.0 to Deploy swh-search v0.7.0/v0.7.1.
Mar 4 2021, 4:01 PM · System administration, Journal, Archive search
vsellier added a revision to T3076: [swh-search] Improve the index/mapping migration process: D5196: Allow to instantiate the service with default indexes configuration.
Mar 4 2021, 3:21 PM · System administration, Journal, Archive search
vsellier changed the status of T3083: Deploy swh-search v0.7.0/v0.7.1 from Open to Work in Progress.
Mar 4 2021, 12:09 PM · System administration, Journal, Archive search

Mar 3 2021

vsellier added a revision to T3076: [swh-search] Improve the index/mapping migration process: D5193: Ensure the elasticsearch indexes are initialized before the first request.
Mar 3 2021, 6:03 PM · System administration, Journal, Archive search

Mar 2 2021

vsellier added a revision to T3076: [swh-search] Improve the index/mapping migration process: D5184: Add missing server tests.
Mar 2 2021, 5:51 PM · System administration, Journal, Archive search
vsellier added a revision to T3076: [swh-search] Improve the index/mapping migration process: D5179: Use elasticsearch aliases to simplify maintenance operations.
Mar 2 2021, 10:44 AM · System administration, Journal, Archive search

Mar 1 2021

vsellier changed the status of T3076: [swh-search] Improve the index/mapping migration process, a subtask of T2590: Finish the indexer -> swh-search pipeline, from Open to Work in Progress.
Mar 1 2021, 3:54 PM · Journal, Archive search
vsellier changed the status of T3076: [swh-search] Improve the index/mapping migration process from Open to Work in Progress.
Mar 1 2021, 3:54 PM · System administration, Journal, Archive search
vsellier updated the task description for T3076: [swh-search] Improve the index/mapping migration process.
Mar 1 2021, 3:38 PM · System administration, Journal, Archive search
anlambert closed T2934: Hide unvisited origins search option is not honored with elasticsearch backend as Invalid.

The JSON document associated to an origin in ES has a has_visit field, closing this as invalid.

Mar 1 2021, 2:50 PM · Archive search, Web app
ardumont updated the task description for T3076: [swh-search] Improve the index/mapping migration process.
Mar 1 2021, 2:01 PM · System administration, Journal, Archive search
vsellier renamed T3076: [swh-search] Improve the index/mapping migration process from [swh-search] Improve the migration process to [swh-search] Improve the index/mapping migration process.
Mar 1 2021, 1:01 PM · System administration, Journal, Archive search
vsellier renamed T3076: [swh-search] Improve the index/mapping migration process from [use index aliases] to [swh-search] Improve the migration process.
Mar 1 2021, 1:01 PM · System administration, Journal, Archive search
vsellier triaged T3076: [swh-search] Improve the index/mapping migration process as Normal priority.
Mar 1 2021, 1:00 PM · System administration, Journal, Archive search
vsellier closed T3060: Deploy swh-search v0.6.0 in **staging**, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Mar 1 2021, 10:55 AM · Journal, Archive search
vsellier closed T3060: Deploy swh-search v0.6.0 in **staging**, a subtask of T3058: Metadata search is failing with "failed to parse date field", as Resolved.
Mar 1 2021, 10:55 AM · Archive search
vsellier closed T3060: Deploy swh-search v0.6.0 in **staging** as Resolved.

the backfill is done, the search on metadata seems to work correctly.

Mar 1 2021, 10:55 AM · System administration, Archive search
vsellier added a comment to T3067: elasticsearch cluster disk usage and maintenance.

The backfill / reindexation looks aggressive for the cluster and the search. There is a lot of timeouts on the webapp's search

  File "/usr/lib/python3/dist-packages/elasticsearch/connection/http_urllib3.py", line 249, in perform_request
    raise ConnectionTimeout("TIMEOUT", str(e), e)
elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='search-esnode3.internal.softwareheritage.org', port=9200): Read timed out. (read timeout=10))
Mar 1 2021, 9:59 AM · Archive search, System administration

Feb 26 2021

ardumont closed T3067: elasticsearch cluster disk usage and maintenance as Resolved.
Feb 26 2021, 10:22 AM · Archive search, System administration
ardumont added a comment to T3067: elasticsearch cluster disk usage and maintenance.

Expectedly with the previous action, number of documents started growing again.

green  open   origin-production hZfuv0lVRImjOjO_rYgDzg  90   1  152795694       297907    217.6gb        109.2gb
Feb 26 2021, 10:22 AM · Archive search, System administration
ardumont moved T3067: elasticsearch cluster disk usage and maintenance from in-progress to deployed/landed/monitoring on the System administration board.
Feb 26 2021, 10:15 AM · Archive search, System administration
ardumont added a comment to T3067: elasticsearch cluster disk usage and maintenance.

We can start back the swh-search-journal-client@object service.

Feb 26 2021, 10:14 AM · Archive search, System administration
ardumont added a comment to T3067: elasticsearch cluster disk usage and maintenance.

Install alias "origin" on "origin-production" index:

Feb 26 2021, 10:08 AM · Archive search, System administration

Feb 25 2021

ardumont added a comment to T3067: elasticsearch cluster disk usage and maintenance.

Finally finished:

root@search-esnode1:~# curl -XPOST -H "Content-Type: application/json" ${ES_SERVER}/_reindex\?pretty\&refresh=true\&requests_per_second=-1\&\&wait_for_completion=true -d @reindex-origin.json
{
  "took" : 115296461,
  "timed_out" : false,
  "total" : 152756759,
  "updated" : 0,
  "created" : 152756759,
  "deleted" : 0,
  "batches" : 152757,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}
Feb 25 2021, 8:58 PM · Archive search, System administration

Feb 24 2021

ardumont added a comment to T3067: elasticsearch cluster disk usage and maintenance.

Status, still in progress:

Every 10.0s: curl -s http://192.168.100.81:9200/_cat/nodes\?v; echo ; curl -s http://192.168.100.81:9200/_cat/indices\?v ; echo ; df -h | grep elastic                                                 search-esnode1: Wed Feb 24 16:14:58 2021
Feb 24 2021, 5:15 PM · Archive search, System administration
ardumont added a comment to T3067: elasticsearch cluster disk usage and maintenance.

So turns out, the mapping initialization step was missing. So cleanup, rinse, repeat..
without forgetting the mapping initialization step this time...

Feb 24 2021, 12:58 PM · Archive search, System administration
ardumont added a comment to T3067: elasticsearch cluster disk usage and maintenance.

Copy just finished:

root@search-esnode1:~# curl -XPOST -H "Content-Type: application/json" http://${ES_SERVER}/_reindex\?pretty\&refresh=true\&requests_per_second=-1\&\&wait_for_completion=true -d @reindex-origin.json
{
  "took" : 91121031,
  "timed_out" : false,
  "total" : 152756759,
  "updated" : 0,
  "created" : 152756759,
  "deleted" : 0,
  "batches" : 152757,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}
Feb 24 2021, 11:49 AM · Archive search, System administration

Feb 23 2021

ardumont added a comment to T3067: elasticsearch cluster disk usage and maintenance.

Create new index out of the old

Feb 23 2021, 10:56 AM · Archive search, System administration
ardumont changed the status of T3067: elasticsearch cluster disk usage and maintenance from Open to Work in Progress.
Feb 23 2021, 10:38 AM · Archive search, System administration
ardumont updated the task description for T3067: elasticsearch cluster disk usage and maintenance.
Feb 23 2021, 10:38 AM · Archive search, System administration
ardumont added a comment to T3067: elasticsearch cluster disk usage and maintenance.

Initially wrongly written in T3060#59291.

Feb 23 2021, 10:36 AM · Archive search, System administration
ardumont triaged T3067: elasticsearch cluster disk usage and maintenance as Normal priority.
Feb 23 2021, 10:34 AM · Archive search, System administration
ardumont added a comment to T3060: Deploy swh-search v0.6.0 in **staging**.

Comment discarded (unrelated to this task) and reported in a dedicated task [1]

Feb 23 2021, 10:07 AM · System administration, Archive search

Feb 19 2021

vsellier added a comment to T3060: Deploy swh-search v0.6.0 in **staging**.
  • stop the journal client
root@search0:~# systemctl stop swh-search-journal-client@objects.service 
root@search0:~# puppet agent --disable "stop search journal client to reset offsets"
  • reset the offset for the swh.journal.objects.origin_visit topic:
vsellier@journal0 ~ % /opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server $SERVER --reset-offsets --topic swh.journal.objects.origin_visit --to-earliest --group swh.search.journal_client --execute
Feb 19 2021, 12:28 PM · System administration, Archive search
vsellier added a comment to T3060: Deploy swh-search v0.6.0 in **staging**.

Regarding the missing visit_type, one of the topic with the visit_type needs to be visited again to populate the fields for all the origins.
As the index was restored from the backup, the fields was only set for the visits done since the last 15days.
The offset will be reset for the origin_visit to limit the work.

Feb 19 2021, 12:02 PM · System administration, Archive search
vsellier added a comment to T3060: Deploy swh-search v0.6.0 in **staging**.

Regarding the index size, it seems it's due to a huge number of deleted documents (probably due to the backlog and an update of the documents at each change)

% curl  -s http://${ES_SERVER}/_cat/indices\?v                                                       
health status index                       uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin                      HthJj42xT5uO7w3Aoxzppw  80   0     868634      8577610     10.5gb         10.5gb
green  close  origin-backup-20210209-1736 P1CKjXW0QiWM5zlzX46-fg  80   0                                                  
green  open   origin-v0.5.0               SGplSaqPR_O9cPYU4ZsmdQ  80   0     868121            0    987.7mb        987.7mb
green  open   origin-toremove             PL7WEs3FTJSQy4dgGIwpeQ  80   0     868610            0    987.5mb        987.5mb  <-- A clean copy of the origin index has almose the same size as yesterday

Forcing a merge seems restore a decent size :

% curl -XPOST -H "Content-Type: application/json" http://${ES_SERVER}/origin/_forcemerge                           
{"_shards":{"total":80,"successful":80,"failed":0}}%
% curl  -s http://${ES_SERVER}/_cat/indices\?v      
health status index                       uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin                      HthJj42xT5uO7w3Aoxzppw  80   0     868684         3454        1gb            1gb
green  close  origin-backup-20210209-1736 P1CKjXW0QiWM5zlzX46-fg  80   0                                                  
green  open   origin-v0.5.0               SGplSaqPR_O9cPYU4ZsmdQ  80   0     868121            0    987.7mb        987.7mb
green  open   origin-toremove             PL7WEs3FTJSQy4dgGIwpeQ  80   0     868610            0    987.5mb        987.5mb

It will be probably something to schedule regularly on production index if size matters

Feb 19 2021, 10:57 AM · System administration, Archive search
vsellier added a comment to T3060: Deploy swh-search v0.6.0 in **staging**.

The journal clients recovered, so the index is up-to-date.
Let's check some point before closing :

  • The index size looks huge (~10g) compared to before the deployment
  • it seems some document have no origin_visit_type populated as they should :
swh=> select * from origin where url='deb://Debian/packages/node-response-time';
  id   |                   url                    
-------+------------------------------------------
 15552 | deb://Debian/packages/node-response-time
(1 row)
Feb 19 2021, 10:34 AM · System administration, Archive search
vsellier updated the task description for T3060: Deploy swh-search v0.6.0 in **staging**.
Feb 19 2021, 9:51 AM · System administration, Archive search

Feb 18 2021

vsellier updated the task description for T3060: Deploy swh-search v0.6.0 in **staging**.
Feb 18 2021, 5:07 PM · System administration, Archive search
vsellier updated the task description for T3060: Deploy swh-search v0.6.0 in **staging**.
Feb 18 2021, 4:57 PM · System administration, Archive search
vsellier added a comment to T3060: Deploy swh-search v0.6.0 in **staging**.
  1. Copy the backup of the index done in T2780
Feb 18 2021, 4:57 PM · System administration, Archive search
vsellier updated the task description for T3060: Deploy swh-search v0.6.0 in **staging**.
Feb 18 2021, 4:36 PM · System administration, Archive search
vsellier added a comment to T3060: Deploy swh-search v0.6.0 in **staging**.
  1. delete current index
Feb 18 2021, 4:36 PM · System administration, Archive search
anlambert closed T2867: Webapp search UI: set focus on search input by default as Resolved by committing rDWAPPSf34eca75210d: templates/origin-search-form: Set autofocus to search input.
Feb 18 2021, 4:35 PM · Web app, Archive search
vsellier updated the task description for T3060: Deploy swh-search v0.6.0 in **staging**.
Feb 18 2021, 4:07 PM · System administration, Archive search
vsellier added a comment to T3060: Deploy swh-search v0.6.0 in **staging**.

stop the journal clients and swh-search

root@search0:~# puppet agent --disable "swh-search upgrade"
root@search0:~# systemctl stop swh-search-journal-client@objects.service 
root@search0:~# systemctl stop swh-search-journal-client@indexed.service
root@search0:~# systemctl stop gunicorn-swh-search.service

update the packages

root@search0:~# apt update && apt list --upgradable
...
python3-swh.search/unknown 0.6.0-1~swh1~bpo10+1 all [upgradable from: 0.5.0-1~swh1~bpo10+1]
...
Feb 18 2021, 4:07 PM · System administration, Archive search
vsellier updated the task description for T3060: Deploy swh-search v0.6.0 in **staging**.
Feb 18 2021, 3:58 PM · System administration, Archive search
vsellier added a parent task for T3060: Deploy swh-search v0.6.0 in **staging**: T3058: Metadata search is failing with "failed to parse date field".
Feb 18 2021, 3:45 PM · System administration, Archive search
vsellier added a subtask for T3058: Metadata search is failing with "failed to parse date field": T3060: Deploy swh-search v0.6.0 in **staging**.
Feb 18 2021, 3:45 PM · Archive search
vsellier moved T3060: Deploy swh-search v0.6.0 in **staging** from Backlog to in-progress on the System administration board.
Feb 18 2021, 3:42 PM · System administration, Archive search
vsellier changed the status of T3060: Deploy swh-search v0.6.0 in **staging** from Open to Work in Progress.
Feb 18 2021, 3:41 PM · System administration, Archive search
anlambert added a revision to T2867: Webapp search UI: set focus on search input by default: D5109: templates/origin-search-form: Set autofocus to search input.
Feb 18 2021, 3:38 PM · Web app, Archive search
vsellier closed T3042: swh-search: add statsd/prometheus metrics as Resolved.
Feb 18 2021, 3:08 PM · System administration, Archive search
vsellier added a comment to T3042: swh-search: add statsd/prometheus metrics.

The dashboard was moved to the system directory: the new url is https://grafana.softwareheritage.org/goto/uBHBojEGz

Feb 18 2021, 3:07 PM · System administration, Archive search
vlorentz closed T3058: Metadata search is failing with "failed to parse date field", a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Feb 18 2021, 12:13 PM · Journal, Archive search
vlorentz closed T3058: Metadata search is failing with "failed to parse date field" as Resolved.
Feb 18 2021, 12:13 PM · Archive search
vlorentz renamed T3058: Metadata search is failing with "failed to parse date field" from Metadata search is failing when swh-search is activated to Metadata search is failing with "failed to parse date field".
Feb 18 2021, 12:13 PM · Archive search
vlorentz added a revision to T3058: Metadata search is failing with "failed to parse date field": D5106: elasticsearch: Disable date_detection in origin mapping.
Feb 18 2021, 12:13 PM · Archive search
vsellier added a comment to T3042: swh-search: add statsd/prometheus metrics.

swh-search:v0.5.0 deployed in all the environments, the metrics are correctly gathered by prometheus.
Let's create a real dashboard now [1]

Feb 18 2021, 12:03 PM · System administration, Archive search
vsellier added a revision to T3042: swh-search: add statsd/prometheus metrics: D5104: swh-search: activate metrics.
Feb 18 2021, 11:35 AM · System administration, Archive search
vsellier added a comment to T3058: Metadata search is failing with "failed to parse date field".

This is the mapping of the origin index with the metadata : P953

Feb 18 2021, 11:14 AM · Archive search
vsellier triaged T3058: Metadata search is failing with "failed to parse date field" as Normal priority.
Feb 18 2021, 10:28 AM · Archive search
vsellier added a revision to T3042: swh-search: add statsd/prometheus metrics: D5103: Add metrics to monitor activity.
Feb 18 2021, 10:12 AM · System administration, Archive search
vsellier claimed T3042: swh-search: add statsd/prometheus metrics.
Feb 18 2021, 10:06 AM · System administration, Archive search
vsellier moved T2944: Deploy swh-search v0.4.1 from deployed/landed/monitoring to done on the System administration board.
Feb 18 2021, 9:27 AM · System administration, Journal, Archive search

Feb 17 2021

anlambert closed T3047: Enable to search in origin metadata with swh-search in webapp, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Feb 17 2021, 12:12 PM · Journal, Archive search
anlambert closed T3047: Enable to search in origin metadata with swh-search in webapp as Resolved by committing rDWAPPS4be7279a7458: archive: Allow to search in origin metadata with swh-search backend.
Feb 17 2021, 12:12 PM · Web app, Archive search

Feb 16 2021

anlambert added a revision to T3047: Enable to search in origin metadata with swh-search in webapp: D5091: docker: Enable swh-search backend in swh-web when using elasticsearch.
Feb 16 2021, 7:32 PM · Web app, Archive search
anlambert added a revision to T3047: Enable to search in origin metadata with swh-search in webapp: D5087: archive: Allow to search in origin metadata with swh-search backend.
Feb 16 2021, 6:35 PM · Web app, Archive search
anlambert added a revision to T3047: Enable to search in origin metadata with swh-search in webapp: D5086: in_memory: Implement origin intrinsic metdata search.
Feb 16 2021, 6:23 PM · Web app, Archive search
vsellier changed the status of T3042: swh-search: add statsd/prometheus metrics from Open to Work in Progress.
Feb 16 2021, 6:22 PM · System administration, Archive search

Feb 15 2021

anlambert renamed T3047: Enable to search in origin metadata with swh-search in webapp from Enable to search metadata with swh-search in webapp to Enable to search in origin metadata with swh-search in webapp.
Feb 15 2021, 12:11 PM · Web app, Archive search
anlambert triaged T3047: Enable to search in origin metadata with swh-search in webapp as Normal priority.
Feb 15 2021, 11:31 AM · Web app, Archive search
vsellier renamed T3041: [production] Provision enough space for the search ES cluster to ingest all intrinsic metadata from Provision enough space for the search ES cluster to ingest all intrinsic metadata to [production] Provision enough space for the search ES cluster to ingest all intrinsic metadata.
Feb 15 2021, 10:02 AM · System administration, Archive search
vsellier renamed T3040: [production] Enable swh-search's journal-client for indexed objects from Enable swh-search's journal-client for indexed objects to [production] Enable swh-search's journal-client for indexed objects.
Feb 15 2021, 10:02 AM · System administration, Journal, Archive search

Feb 12 2021

ardumont closed T3037: Reschedule origin-intrinsic-metadata tasks for all origins, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Feb 12 2021, 6:32 PM · Journal, Archive search
ardumont closed T3037: Reschedule origin-intrinsic-metadata tasks for all origins as Resolved.
Feb 12 2021, 6:32 PM · System administration, Journal, Archive search
vsellier added a comment to T3042: swh-search: add statsd/prometheus metrics.

A basic dashboard [1] is created on garfana based on the number of log line.
It's too limited as it's not possible to isolate the logs per environment as the information is not available.
It will be added in T3043

Feb 12 2021, 5:53 PM · System administration, Archive search
vsellier moved T3042: swh-search: add statsd/prometheus metrics from Backlog to Weekly backlog on the System administration board.
Feb 12 2021, 5:44 PM · System administration, Archive search
vsellier triaged T3042: swh-search: add statsd/prometheus metrics as Normal priority.
Feb 12 2021, 12:19 PM · System administration, Archive search

Feb 11 2021

vsellier placed T3041: [production] Provision enough space for the search ES cluster to ingest all intrinsic metadata up for grabs.
Feb 11 2021, 3:33 PM · System administration, Archive search
vsellier placed T3040: [production] Enable swh-search's journal-client for indexed objects up for grabs.
Feb 11 2021, 3:32 PM · System administration, Journal, Archive search
ardumont moved T3037: Reschedule origin-intrinsic-metadata tasks for all origins from in-progress to deployed/landed/monitoring on the System administration board.
Feb 11 2021, 3:22 PM · System administration, Journal, Archive search
ardumont added a comment to T3037: Reschedule origin-intrinsic-metadata tasks for all origins.

Done scheduling:

Feb 11 2021, 3:22 PM · System administration, Journal, Archive search
vsellier added a comment to T3040: [production] Enable swh-search's journal-client for indexed objects.

T3041 needs to be done before this one (for the production environment)

Feb 11 2021, 2:21 PM · System administration, Journal, Archive search
vlorentz triaged T3041: [production] Provision enough space for the search ES cluster to ingest all intrinsic metadata as Normal priority.
Feb 11 2021, 1:17 PM · System administration, Archive search
vlorentz triaged T3040: [production] Enable swh-search's journal-client for indexed objects as Normal priority.
Feb 11 2021, 1:17 PM · System administration, Journal, Archive search
vsellier added a project to T2182: Switch production swh-web to use swh-search instead of postgresql search.: System administration.
Feb 11 2021, 12:14 PM · System administration, Archive search, Storage manager
vsellier closed T2182: Switch production swh-web to use swh-search instead of postgresql search., a subtask of T1910: Redesign origin search using a dedicated component (swh-search), as Resolved.
Feb 11 2021, 12:10 PM · Archive search, Storage manager
vsellier closed T2182: Switch production swh-web to use swh-search instead of postgresql search. as Resolved.

D5063 is applied, the main webapp is now using swh-search by default.

Feb 11 2021, 12:10 PM · System administration, Archive search, Storage manager
vsellier added a revision to T2182: Switch production swh-web to use swh-search instead of postgresql search.: D5063: webapp: use swh-search as main search engine in production.
Feb 11 2021, 11:27 AM · System administration, Archive search, Storage manager
ardumont added a comment to T3037: Reschedule origin-intrinsic-metadata tasks for all origins.

Running:

swhscheduler@saatchi:~$ /usr/bin/swh scheduler --config-file /etc/softwareheritage/scheduler/backend.yml task schedule_
origins --storage-url http://saam.internal.softwareheritage.org:5002 --batch-size 20 index-origin-metadata | tee /tmp/schedule-origins.txt
Feb 11 2021, 11:05 AM · System administration, Journal, Archive search
vlorentz changed the status of T2590: Finish the indexer -> swh-search pipeline, a subtask of T2182: Switch production swh-web to use swh-search instead of postgresql search., from Open to Work in Progress.
Feb 11 2021, 11:01 AM · System administration, Archive search, Storage manager
vlorentz changed the status of T2590: Finish the indexer -> swh-search pipeline from Open to Work in Progress.
Feb 11 2021, 11:01 AM · Journal, Archive search
ardumont added a comment to T3037: Reschedule origin-intrinsic-metadata tasks for all origins.

@ardumont no, OriginMetadataIndexer lacks a filter step.

Feb 11 2021, 10:47 AM · System administration, Journal, Archive search
vlorentz added a comment to T3037: Reschedule origin-intrinsic-metadata tasks for all origins.

@ardumont no, OriginMetadataIndexer lacks a filter step.

Feb 11 2021, 10:12 AM · System administration, Journal, Archive search
ardumont closed T2780: Enable the journal-writer for the swh-idx-storage in production, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Feb 11 2021, 9:49 AM · Journal, Archive search
ardumont closed T2780: Enable the journal-writer for the swh-idx-storage in production, a subtask of T3037: Reschedule origin-intrinsic-metadata tasks for all origins, as Resolved.
Feb 11 2021, 9:48 AM · System administration, Journal, Archive search