Page MenuHomeSoftware Heritage
Feed Advanced Search

Nov 22 2021

olasd closed T2543: Some messages in the (azure) kafka cluster are too large for rdkafka clients to be able to decompress them as Invalid.

There is no azure kafka cluster anymore...

Nov 22 2021, 1:16 PM · System administration, Journal

Sep 8 2021

vlorentz closed T2590: Finish the indexer -> swh-search pipeline as Resolved.
Sep 8 2021, 3:35 PM · Journal, Archive search
vsellier closed T3040: [production] Enable swh-search's journal-client for indexed objects, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Sep 8 2021, 3:24 PM · Journal, Archive search
vsellier closed T3040: [production] Enable swh-search's journal-client for indexed objects as Resolved.

metadata searches are now done in Elasticsearch since the deployment of T3433

Sep 8 2021, 3:24 PM · System administration, Journal, Archive search

Sep 3 2021

vsellier added a revision to T3040: [production] Enable swh-search's journal-client for indexed objects: D6183: swh-search: activate metadata search all ES on the main webapp.
Sep 3 2021, 3:45 PM · System administration, Journal, Archive search

Aug 30 2021

vlorentz assigned T3040: [production] Enable swh-search's journal-client for indexed objects to vsellier.
Aug 30 2021, 10:41 AM · System administration, Journal, Archive search

Aug 26 2021

olasd merged task T1278: swh-journal: the monitoring tool question! into T2128: Monitor journal consumer lag.
Aug 26 2021, 12:30 PM · Journal
vlorentz closed T2823: Write tests for swh/journal/writer/inmemory.py as Resolved.

I think so, thanks

Aug 26 2021, 9:10 AM · Easy hack, Journal
KShivendu added a comment to T2823: Write tests for swh/journal/writer/inmemory.py.

@vlorentz should we close this one?

Aug 26 2021, 5:08 AM · Easy hack, Journal

Aug 25 2021

vsellier added a comment to T3501: Too many open files error on kafka.

status.io incident closed

Aug 25 2021, 11:55 AM · Journal, System administration
vsellier added a comment to T3501: Too many open files error on kafka.

Save code now requests rescheduled:

swh-web=> select * from save_origin_request where loading_task_status='scheduled' limit 100;
...
<output loast due to the psql pager :(
...
softwareheritage-scheduler=> select * from task where id in (398244739, 398244740, 398244742, 398244744, 398244745, 398244748, 398095676, 397470401, 397470402, 397470404, 397470399);

few minutes later:

swh-web=> select * from save_origin_request where loading_task_status='scheduled' limit 100;
 id | request_date | visit_type | origin_url | status | loading_task_id | visit_date | loading_task_status | visit_status | user_ids 
----+--------------+------------+------------+--------+-----------------+------------+---------------------+--------------+----------
(0 rows)
Aug 25 2021, 11:53 AM · Journal, System administration
vsellier added a comment to T3501: Too many open files error on kafka.
  • all the workers are restarted
  • Several save code now requests look stuck in the scheduled status, currently looking how to unblock them
Aug 25 2021, 11:37 AM · Journal, System administration
vsellier closed T3501: Too many open files error on kafka as Resolved.

D6130 landed and applied one kafka at a time

Aug 25 2021, 11:18 AM · Journal, System administration
vsellier added a revision to T3501: Too many open files error on kafka: D6130: kafka: increase the open file limit.
Aug 25 2021, 10:25 AM · Journal, System administration
vsellier added a comment to T3501: Too many open files error on kafka.

ok roger that :).
I will increase to 524288 in the diff

Aug 25 2021, 10:21 AM · Journal, System administration
olasd added a comment to T3501: Too many open files error on kafka.

The kafka servers are only running kafka and zookeeper, so the limit of open files isn't that critical. I think we can bump the limit more substantially than just x2 (maybe go directly with x8?), as I expect we'll still be adding more topics in the future.

Aug 25 2021, 10:17 AM · Journal, System administration
vsellier added a comment to T3501: Too many open files error on kafka.

all the loaders are restarted on worker01 and workers02, it seems the cluster is ok.

Aug 25 2021, 10:12 AM · Journal, System administration
vsellier added a comment to T3501: Too many open files error on kafka.

The open file limit was manually increased to stabilize the cluster:

# puppet agent --disable T3501
# diff -U3 /tmp/kafka.service kafka.service
--- /tmp/kafka.service	2021-08-25 07:32:28.068928972 +0000
+++ kafka.service	2021-08-25 07:32:31.384955246 +0000
@@ -15,7 +15,7 @@
 Environment='LOG_DIR=/var/log/kafka'
 Type=simple
 ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
-LimitNOFILE=65536
+LimitNOFILE=131072
Aug 25 2021, 9:43 AM · Journal, System administration
vsellier added a comment to T3501: Too many open files error on kafka.
  • Incident created on status.io
  • Loader disabled:
root@pergamon:~# clush -b -w @swh-workers 'puppet agent --disable "Kafka incident T3501"; systemctl stop cron; cd /etc/systemd/system/multi-user.target.wants; for unit in swh-worker@loader_*; do systemctl disable $unit; done; systemctl stop "swh-worker@loader_*"'
Aug 25 2021, 9:15 AM · Journal, System administration
vsellier changed the status of T3501: Too many open files error on kafka from Open to Work in Progress.
Aug 25 2021, 9:04 AM · Journal, System administration

Jun 11 2021

vsellier closed T3041: [production] Provision enough space for the search ES cluster to ingest all intrinsic metadata, a subtask of T3040: [production] Enable swh-search's journal-client for indexed objects, as Resolved.
Jun 11 2021, 10:23 AM · System administration, Journal, Archive search

Jun 8 2021

ardumont changed the status of T3041: [production] Provision enough space for the search ES cluster to ingest all intrinsic metadata, a subtask of T3040: [production] Enable swh-search's journal-client for indexed objects, from Open to Work in Progress.
Jun 8 2021, 4:48 PM · System administration, Journal, Archive search

May 4 2021

KShivendu added a comment to T3304: Kafka throws flush timeout error.

If you face this issue, try restarting the containers using docker-compose down and docker-compose up.

May 4 2021, 4:00 PM · Docker environment, Journal
vlorentz triaged T3304: Kafka throws flush timeout error as High priority.
May 4 2021, 12:24 PM · Docker environment, Journal
vlorentz edited projects for T3304: Kafka throws flush timeout error, added: Docker environment; removed Core Loader.
May 4 2021, 12:24 PM · Docker environment, Journal
KShivendu added projects to T3304: Kafka throws flush timeout error: Journal, Core Loader.
May 4 2021, 11:53 AM · Docker environment, Journal

Apr 21 2021

douardda added a comment to T3170: Revisions in the journal with out of range dates.

Note that none of their parent revisions can be found either in the archive (one invalid revision in a set of ingested revisions prevent any of them being inserted in the database I suppose, but they are already inserted in kafka at this moment).

Apr 21 2021, 7:08 PM · Data Model, Journal

Apr 20 2021

vlorentz added a comment to T2823: Write tests for swh/journal/writer/inmemory.py.

If we replaced the whole code with just this:

Apr 20 2021, 10:12 AM · Easy hack, Journal

Apr 19 2021

KShivendu added a comment to T2823: Write tests for swh/journal/writer/inmemory.py.

Do you some more tests or this task can be declared as resolved?

Apr 19 2021, 7:09 PM · Easy hack, Journal
olasd added a parent task for T2003: Content replayer may try to copy objects before they are available from an objstorage: T1954: Up-to-date objstorage mirror on S3.
Apr 19 2021, 12:07 PM · Journal
olasd closed T2003: Content replayer may try to copy objects before they are available from an objstorage as Resolved.

So D5246 has landed a while ago. The s3 object copy process has now caught up on some partitions and I can confirm that the copy of the latest added objects happens without any race condition.

Apr 19 2021, 12:06 PM · Journal

Apr 6 2021

vlorentz added a comment to T2823: Write tests for swh/journal/writer/inmemory.py.

Pass an object without `unique_key` and check it does raise an exception

Apr 6 2021, 12:43 PM · Easy hack, Journal

Apr 4 2021

KShivendu added a comment to T2823: Write tests for swh/journal/writer/inmemory.py.

Hey @vlorentz
How do I check https://forge.softwareheritage.org/source/swh-journal/browse/master/swh/journal/writer/inmemory.py$31. Do I have to pass dummy content, raw_extrinsic_metadata, origin_visit, et cetera as the object_ to write_addition function and before passing verify if they have unique_key function implemented ?

Apr 4 2021, 10:13 AM · Easy hack, Journal

Apr 1 2021

vsellier added a comment to T3191: journal-client: Add support of max message size configuration.

The journal client supports dynamic configuration via kwargs so no there is no need to improve it.

Apr 1 2021, 12:11 PM · Journal
vsellier closed T3191: journal-client: Add support of max message size configuration as Invalid.
Apr 1 2021, 12:11 PM · Journal
vsellier changed the status of T3191: journal-client: Add support of max message size configuration from Open to Work in Progress.
Apr 1 2021, 11:52 AM · Journal
vlorentz added a parent task for T2590: Finish the indexer -> swh-search pipeline: T1117: Origin search is *slow* when you look for very common words.
Apr 1 2021, 10:51 AM · Journal, Archive search

Mar 24 2021

seirl updated the task description for T3170: Revisions in the journal with out of range dates.
Mar 24 2021, 6:56 PM · Data Model, Journal
seirl updated the task description for T3170: Revisions in the journal with out of range dates.
Mar 24 2021, 4:11 PM · Data Model, Journal
seirl updated the task description for T3170: Revisions in the journal with out of range dates.
Mar 24 2021, 4:11 PM · Data Model, Journal
seirl updated the task description for T3170: Revisions in the journal with out of range dates.
Mar 24 2021, 4:10 PM · Data Model, Journal
seirl triaged T3170: Revisions in the journal with out of range dates as Normal priority.
Mar 24 2021, 1:13 PM · Data Model, Journal

Mar 15 2021

vlorentz added a revision to T2003: Content replayer may try to copy objects before they are available from an objstorage: D5246: content_add: Write to the objstorage before the DB or Kafka.
Mar 15 2021, 12:54 PM · Journal

Mar 5 2021

vsellier added a comment to T3083: Deploy swh-search v0.7.0/v0.7.1.

I forgot one step, cleaning the previous alias origin -> origin_production not needed anymore:

vsellier@search-esnode1 ~ % curl -s http://$ES_SERVER/_cat/indices\?v && echo && curl -s http://$ES_SERVER/_cat/aliases\?v && echo && curl -s http://$ES_SERVER/_cat/health\?v  
health status index             uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin-production hZfuv0lVRImjOjO_rYgDzg  90   1  153130652     26701625    273.4gb        137.3gb
Mar 5 2021, 10:45 AM · System administration, Journal, Archive search
ardumont added a comment to T3076: [swh-search] Improve the index/mapping migration process.

awesome

Mar 5 2021, 10:36 AM · System administration, Journal, Archive search
vsellier closed T3076: [swh-search] Improve the index/mapping migration process as Resolved.

The new configuration is deployed, swh-search is now using the alias which should help for the future upgrades

Mar 5 2021, 10:35 AM · System administration, Journal, Archive search
vsellier closed T3076: [swh-search] Improve the index/mapping migration process, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Mar 5 2021, 10:35 AM · Journal, Archive search
vsellier closed T3083: Deploy swh-search v0.7.0/v0.7.1, a subtask of T3076: [swh-search] Improve the index/mapping migration process, as Resolved.
Mar 5 2021, 10:34 AM · System administration, Journal, Archive search
vsellier closed T3083: Deploy swh-search v0.7.0/v0.7.1 as Resolved.
Mar 5 2021, 10:34 AM · System administration, Journal, Archive search
vsellier added a comment to T3083: Deploy swh-search v0.7.0/v0.7.1.

Deployment in production:

  • puppet stopped
  • configuration updated to declare the index, it needs to be done to make swh-search initializing the aliaes before the journal clients starts (not guaranteed with a puppet apply)
  • package updated
  • gunicorn-swh-search service restarted:
Mar 05 09:08:46 search1 python3[1881743]: 2021-03-05 09:08:46 [1881743] gunicorn.error:INFO Starting gunicorn 19.9.0
Mar 05 09:08:46 search1 python3[1881743]: 2021-03-05 09:08:46 [1881743] gunicorn.error:INFO Listening at: unix:/run/gunicorn/swh-search/gunicorn.sock (1881743)
Mar 05 09:08:46 search1 python3[1881743]: 2021-03-05 09:08:46 [1881743] gunicorn.error:INFO Using worker: sync
Mar 05 09:08:46 search1 python3[1881748]: 2021-03-05 09:08:46 [1881748] gunicorn.error:INFO Booting worker with pid: 1881748
Mar 05 09:08:46 search1 python3[1881749]: 2021-03-05 09:08:46 [1881749] gunicorn.error:INFO Booting worker with pid: 1881749
Mar 05 09:08:46 search1 python3[1881750]: 2021-03-05 09:08:46 [1881750] gunicorn.error:INFO Booting worker with pid: 1881750
Mar 05 09:08:46 search1 python3[1881751]: 2021-03-05 09:08:46 [1881751] gunicorn.error:INFO Booting worker with pid: 1881751
Mar 05 09:08:53 search1 python3[1881750]: 2021-03-05 09:08:53 [1881750] swh.search.api.server:INFO Initializing indexes with configuration: 
Mar 05 09:08:53 search1 python3[1881750]: 2021-03-05 09:08:53 [1881750] elasticsearch:INFO HEAD http://search-esnode2.internal.softwareheritage.org:9200/origin-production [status:200 request:0.023s]
Mar 05 09:08:54 search1 python3[1881750]: 2021-03-05 09:08:54 [1881750] elasticsearch:INFO PUT http://search-esnode1.internal.softwareheritage.org:9200/origin-production/_alias/origin-read [status:200 request:0.487s]
Mar 05 09:08:54 search1 python3[1881750]: 2021-03-05 09:08:54 [1881750] elasticsearch:INFO PUT http://search-esnode3.internal.softwareheritage.org:9200/origin-production/_alias/origin-write [status:200 request:0.152s]
Mar 05 09:08:54 search1 python3[1881750]: 2021-03-05 09:08:54 [1881750] elasticsearch:INFO PUT http://search-esnode1.internal.softwareheritage.org:9200/origin-production/_mapping [status:200 request:0.009s]
vsellier@search-esnode1 ~ % curl -s http://$ES_SERVER/_cat/indices\?v && echo && curl -s http://$ES_SERVER/_cat/aliases\?v && echo && curl -s http://$ES_SERVER/_cat/health\?v 
health status index             uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin-production hZfuv0lVRImjOjO_rYgDzg  90   1  153097672    144224208    288.1gb          149gb
Mar 5 2021, 10:34 AM · System administration, Journal, Archive search
vsellier added a revision to T3083: Deploy swh-search v0.7.0/v0.7.1: D5198: swh-search: add indexes configuration.
Mar 5 2021, 9:19 AM · System administration, Journal, Archive search

Mar 4 2021

vsellier added a comment to T3083: Deploy swh-search v0.7.0/v0.7.1.

swh-search:v0.7.1 deployed in staging according to the defined plan.
The aliases are well created and used by the services

vsellier@search-esnode0 ~ % curl -XGET -H "Content-Type: application/json" http://192.168.130.80:9200/_cat/indices
green open  origin                      HthJj42xT5uO7w3Aoxzppw 80 0 929692 137147 4gb 4gb
green close origin-backup-20210209-1736 P1CKjXW0QiWM5zlzX46-fg 80 0                      
green close origin-v0.5.0               SGplSaqPR_O9cPYU4ZsmdQ 80 0                      
vsellier@search-esnode0 ~ % curl -XGET -H "Content-Type: application/json" http://192.168.130.80:9200/_cat/aliases
origin-read  origin - - - -
origin-write origin - - - -

Journal clients:

Mar 04 16:22:40 search0 swh[3598137]: INFO:elasticsearch:POST http://search-esnode0.internal.staging.swh.network:9200/origin-write/_bulk [status:200 request:0.013s]
Mar 04 16:22:41 search0 swh[3598137]: INFO:elasticsearch:POST http://search-esnode0.internal.staging.swh.network:9200/origin-write/_bulk [status:200 request:0.012s]

Search:

Mar 04 15:40:20 search0 python3[3598040]: 2021-03-04 15:40:20 [3598040] swh.search.api.server:INFO Initializing indexes with configuration: 
Mar 04 15:40:20 search0 python3[3598040]: 2021-03-04 15:40:20 [3598040] elasticsearch:INFO HEAD http://search-esnode0.internal.staging.swh.network:9200/origin [status:200 request:0.005s]
Mar 04 15:40:20 search0 python3[3598040]: 2021-03-04 15:40:20 [3598040] elasticsearch:INFO HEAD http://search-esnode0.internal.staging.swh.network:9200/origin-read/_alias [status:200 request:0.001s]
Mar 04 15:40:20 search0 python3[3598040]: 2021-03-04 15:40:20 [3598040] elasticsearch:INFO HEAD http://search-esnode0.internal.staging.swh.network:9200/origin-write/_alias [status:200 request:0.001s]
Mar 04 15:40:20 search0 python3[3598040]: 2021-03-04 15:40:20 [3598040] elasticsearch:INFO PUT http://search-esnode0.internal.staging.swh.network:9200/origin/_mapping [status:200 request:0.006s]
Mar 04 16:19:27 search0 python3[3598042]: 2021-03-04 16:19:27 [3598042] elasticsearch:INFO GET http://search-esnode0.internal.staging.swh.network:9200/origin-read/_search?size=100 [status:200 request:0.076s]
Mar 4 2021, 5:24 PM · System administration, Journal, Archive search
vsellier renamed T3083: Deploy swh-search v0.7.0/v0.7.1 from Deploy swh-search v0.7.0 to Deploy swh-search v0.7.0/v0.7.1.
Mar 4 2021, 4:01 PM · System administration, Journal, Archive search
vsellier added a revision to T3076: [swh-search] Improve the index/mapping migration process: D5196: Allow to instantiate the service with default indexes configuration.
Mar 4 2021, 3:21 PM · System administration, Journal, Archive search
vsellier changed the status of T3083: Deploy swh-search v0.7.0/v0.7.1 from Open to Work in Progress.
Mar 4 2021, 12:09 PM · System administration, Journal, Archive search

Mar 3 2021

vsellier added a revision to T3076: [swh-search] Improve the index/mapping migration process: D5193: Ensure the elasticsearch indexes are initialized before the first request.
Mar 3 2021, 6:03 PM · System administration, Journal, Archive search

Mar 2 2021

vsellier added a revision to T3076: [swh-search] Improve the index/mapping migration process: D5184: Add missing server tests.
Mar 2 2021, 5:51 PM · System administration, Journal, Archive search
vsellier added a revision to T3076: [swh-search] Improve the index/mapping migration process: D5179: Use elasticsearch aliases to simplify maintenance operations.
Mar 2 2021, 10:44 AM · System administration, Journal, Archive search

Mar 1 2021

vsellier changed the status of T3076: [swh-search] Improve the index/mapping migration process, a subtask of T2590: Finish the indexer -> swh-search pipeline, from Open to Work in Progress.
Mar 1 2021, 3:54 PM · Journal, Archive search
vsellier changed the status of T3076: [swh-search] Improve the index/mapping migration process from Open to Work in Progress.
Mar 1 2021, 3:54 PM · System administration, Journal, Archive search
vsellier updated the task description for T3076: [swh-search] Improve the index/mapping migration process.
Mar 1 2021, 3:38 PM · System administration, Journal, Archive search
ardumont updated the task description for T3076: [swh-search] Improve the index/mapping migration process.
Mar 1 2021, 2:01 PM · System administration, Journal, Archive search
vsellier renamed T3076: [swh-search] Improve the index/mapping migration process from [swh-search] Improve the migration process to [swh-search] Improve the index/mapping migration process.
Mar 1 2021, 1:01 PM · System administration, Journal, Archive search
vsellier renamed T3076: [swh-search] Improve the index/mapping migration process from [use index aliases] to [swh-search] Improve the migration process.
Mar 1 2021, 1:01 PM · System administration, Journal, Archive search
vsellier triaged T3076: [swh-search] Improve the index/mapping migration process as Normal priority.
Mar 1 2021, 1:00 PM · System administration, Journal, Archive search
vsellier closed T3060: Deploy swh-search v0.6.0 in **staging**, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Mar 1 2021, 10:55 AM · Journal, Archive search

Feb 18 2021

vlorentz closed T3058: Metadata search is failing with "failed to parse date field", a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Feb 18 2021, 12:13 PM · Journal, Archive search
vsellier moved T2944: Deploy swh-search v0.4.1 from deployed/landed/monitoring to done on the System administration board.
Feb 18 2021, 9:27 AM · System administration, Journal, Archive search

Feb 17 2021

anlambert closed T3047: Enable to search in origin metadata with swh-search in webapp, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Feb 17 2021, 12:12 PM · Journal, Archive search

Feb 15 2021

vsellier renamed T3040: [production] Enable swh-search's journal-client for indexed objects from Enable swh-search's journal-client for indexed objects to [production] Enable swh-search's journal-client for indexed objects.
Feb 15 2021, 10:02 AM · System administration, Journal, Archive search

Feb 12 2021

ardumont closed T3037: Reschedule origin-intrinsic-metadata tasks for all origins, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Feb 12 2021, 6:32 PM · Journal, Archive search
ardumont closed T3037: Reschedule origin-intrinsic-metadata tasks for all origins as Resolved.
Feb 12 2021, 6:32 PM · System administration, Journal, Archive search

Feb 11 2021

vsellier placed T3040: [production] Enable swh-search's journal-client for indexed objects up for grabs.
Feb 11 2021, 3:32 PM · System administration, Journal, Archive search
ardumont moved T3037: Reschedule origin-intrinsic-metadata tasks for all origins from in-progress to deployed/landed/monitoring on the System administration board.
Feb 11 2021, 3:22 PM · System administration, Journal, Archive search
ardumont added a comment to T3037: Reschedule origin-intrinsic-metadata tasks for all origins.

Done scheduling:

Feb 11 2021, 3:22 PM · System administration, Journal, Archive search
vsellier added a comment to T3040: [production] Enable swh-search's journal-client for indexed objects.

T3041 needs to be done before this one (for the production environment)

Feb 11 2021, 2:21 PM · System administration, Journal, Archive search
vlorentz triaged T3040: [production] Enable swh-search's journal-client for indexed objects as Normal priority.
Feb 11 2021, 1:17 PM · System administration, Journal, Archive search
ardumont added a comment to T3037: Reschedule origin-intrinsic-metadata tasks for all origins.

Running:

swhscheduler@saatchi:~$ /usr/bin/swh scheduler --config-file /etc/softwareheritage/scheduler/backend.yml task schedule_
origins --storage-url http://saam.internal.softwareheritage.org:5002 --batch-size 20 index-origin-metadata | tee /tmp/schedule-origins.txt
Feb 11 2021, 11:05 AM · System administration, Journal, Archive search
vlorentz changed the status of T2590: Finish the indexer -> swh-search pipeline from Open to Work in Progress.
Feb 11 2021, 11:01 AM · Journal, Archive search
ardumont added a comment to T3037: Reschedule origin-intrinsic-metadata tasks for all origins.

@ardumont no, OriginMetadataIndexer lacks a filter step.

Feb 11 2021, 10:47 AM · System administration, Journal, Archive search
vlorentz added a comment to T3037: Reschedule origin-intrinsic-metadata tasks for all origins.

@ardumont no, OriginMetadataIndexer lacks a filter step.

Feb 11 2021, 10:12 AM · System administration, Journal, Archive search
ardumont closed T2780: Enable the journal-writer for the swh-idx-storage in production, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Feb 11 2021, 9:49 AM · Journal, Archive search
ardumont closed T2780: Enable the journal-writer for the swh-idx-storage in production, a subtask of T3037: Reschedule origin-intrinsic-metadata tasks for all origins, as Resolved.
Feb 11 2021, 9:48 AM · System administration, Journal, Archive search
ardumont closed T2780: Enable the journal-writer for the swh-idx-storage in production as Resolved.
Feb 11 2021, 9:48 AM · System administration, Journal
ardumont changed the status of T3037: Reschedule origin-intrinsic-metadata tasks for all origins from Open to Work in Progress.
Feb 11 2021, 9:41 AM · System administration, Journal, Archive search
ardumont changed the status of T3037: Reschedule origin-intrinsic-metadata tasks for all origins, a subtask of T2590: Finish the indexer -> swh-search pipeline, from Open to Work in Progress.
Feb 11 2021, 9:41 AM · Journal, Archive search
ardumont added a project to T3037: Reschedule origin-intrinsic-metadata tasks for all origins: System administration.
Feb 11 2021, 9:41 AM · System administration, Journal, Archive search
ardumont added a comment to T3037: Reschedule origin-intrinsic-metadata tasks for all origins.

Although, now i'm wondering something.
Is that enough to write what's not in the topics?

Feb 11 2021, 9:37 AM · System administration, Journal, Archive search
ardumont added a comment to T3037: Reschedule origin-intrinsic-metadata tasks for all origins.

Ah no! I misused the cli, with the right flags:

Feb 11 2021, 9:32 AM · System administration, Journal, Archive search
ardumont added a comment to T3037: Reschedule origin-intrinsic-metadata tasks for all origins.

This needs a storage access so edit a dedicated configuration file.

Feb 11 2021, 9:25 AM · System administration, Journal, Archive search
ardumont added a comment to T3037: Reschedule origin-intrinsic-metadata tasks for all origins.

That's it! [1]

Feb 11 2021, 8:40 AM · System administration, Journal, Archive search

Feb 10 2021

vlorentz added a comment to T3037: Reschedule origin-intrinsic-metadata tasks for all origins.

try swh scheduler task schedule_origins

Feb 10 2021, 11:58 PM · System administration, Journal, Archive search
ardumont added a comment to T3037: Reschedule origin-intrinsic-metadata tasks for all origins.

That suggested cli does not show up but i've only took a quick glance ¯\_(ツ)_/¯:

Feb 10 2021, 7:17 PM · System administration, Journal, Archive search
ardumont added a comment to T3037: Reschedule origin-intrinsic-metadata tasks for all origins.

Rather than write code to read from the database to kafka (like we did with swh-storage), this can be done simply by re-indexing all the origins, using swh scheduler schedule_origins

Feb 10 2021, 6:23 PM · System administration, Journal, Archive search
vlorentz added a parent task for T2780: Enable the journal-writer for the swh-idx-storage in production: T3037: Reschedule origin-intrinsic-metadata tasks for all origins.
Feb 10 2021, 5:14 PM · System administration, Journal
vlorentz added a subtask for T3037: Reschedule origin-intrinsic-metadata tasks for all origins: T2780: Enable the journal-writer for the swh-idx-storage in production.
Feb 10 2021, 5:14 PM · System administration, Journal, Archive search
vlorentz triaged T3037: Reschedule origin-intrinsic-metadata tasks for all origins as Normal priority.
Feb 10 2021, 5:14 PM · System administration, Journal, Archive search
ardumont added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.

Indexer related topics status can be seen in in the indexer ingestion status board [1]

Feb 10 2021, 3:39 PM · System administration, Journal
ardumont moved T2780: Enable the journal-writer for the swh-idx-storage in production from in-progress to deployed/landed/monitoring on the System administration board.
Feb 10 2021, 3:35 PM · System administration, Journal
ardumont added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.

We'll prepare the topics with the following first and we'll improve later if need be:

Feb 10 2021, 1:15 PM · System administration, Journal