Page MenuHomeSoftware Heritage

JournalFolder
ActivePublic

Members

  • This project does not have any members.
  • View All

Watchers

  • This project does not have any watchers.
  • View All

Recent Activity

Mon, Nov 22

olasd closed T2543: Some messages in the (azure) kafka cluster are too large for rdkafka clients to be able to decompress them as Invalid.

There is no azure kafka cluster anymore...

Mon, Nov 22, 1:16 PM · System administration, Journal

Sep 8 2021

vlorentz closed T2590: Finish the indexer -> swh-search pipeline as Resolved.
Sep 8 2021, 3:35 PM · Journal, Archive search
vsellier closed T3040: [production] Enable swh-search's journal-client for indexed objects, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Sep 8 2021, 3:24 PM · Journal, Archive search
vsellier closed T3040: [production] Enable swh-search's journal-client for indexed objects as Resolved.

metadata searches are now done in Elasticsearch since the deployment of T3433

Sep 8 2021, 3:24 PM · System administration, Journal, Archive search

Sep 3 2021

vsellier added a revision to T3040: [production] Enable swh-search's journal-client for indexed objects: D6183: swh-search: activate metadata search all ES on the main webapp.
Sep 3 2021, 3:45 PM · System administration, Journal, Archive search

Aug 30 2021

vlorentz assigned T3040: [production] Enable swh-search's journal-client for indexed objects to vsellier.
Aug 30 2021, 10:41 AM · System administration, Journal, Archive search

Aug 26 2021

olasd merged task T1278: swh-journal: the monitoring tool question! into T2128: Monitor journal consumer lag.
Aug 26 2021, 12:30 PM · Journal
vlorentz closed T2823: Write tests for swh/journal/writer/inmemory.py as Resolved.

I think so, thanks

Aug 26 2021, 9:10 AM · Easy hack, Journal
KShivendu added a comment to T2823: Write tests for swh/journal/writer/inmemory.py.

@vlorentz should we close this one?

Aug 26 2021, 5:08 AM · Easy hack, Journal

Aug 25 2021

vsellier added a comment to T3501: Too many open files error on kafka.

status.io incident closed

Aug 25 2021, 11:55 AM · Journal, System administration
vsellier added a comment to T3501: Too many open files error on kafka.

Save code now requests rescheduled:

swh-web=> select * from save_origin_request where loading_task_status='scheduled' limit 100;
...
<output loast due to the psql pager :(
...
softwareheritage-scheduler=> select * from task where id in (398244739, 398244740, 398244742, 398244744, 398244745, 398244748, 398095676, 397470401, 397470402, 397470404, 397470399);

few minutes later:

swh-web=> select * from save_origin_request where loading_task_status='scheduled' limit 100;
 id | request_date | visit_type | origin_url | status | loading_task_id | visit_date | loading_task_status | visit_status | user_ids 
----+--------------+------------+------------+--------+-----------------+------------+---------------------+--------------+----------
(0 rows)
Aug 25 2021, 11:53 AM · Journal, System administration
vsellier added a comment to T3501: Too many open files error on kafka.
  • all the workers are restarted
  • Several save code now requests look stuck in the scheduled status, currently looking how to unblock them
Aug 25 2021, 11:37 AM · Journal, System administration
vsellier closed T3501: Too many open files error on kafka as Resolved.

D6130 landed and applied one kafka at a time

Aug 25 2021, 11:18 AM · Journal, System administration
vsellier added a revision to T3501: Too many open files error on kafka: D6130: kafka: increase the open file limit.
Aug 25 2021, 10:25 AM · Journal, System administration
vsellier added a comment to T3501: Too many open files error on kafka.

ok roger that :).
I will increase to 524288 in the diff

Aug 25 2021, 10:21 AM · Journal, System administration
olasd added a comment to T3501: Too many open files error on kafka.

The kafka servers are only running kafka and zookeeper, so the limit of open files isn't that critical. I think we can bump the limit more substantially than just x2 (maybe go directly with x8?), as I expect we'll still be adding more topics in the future.

Aug 25 2021, 10:17 AM · Journal, System administration
vsellier added a comment to T3501: Too many open files error on kafka.

all the loaders are restarted on worker01 and workers02, it seems the cluster is ok.

Aug 25 2021, 10:12 AM · Journal, System administration
vsellier added a comment to T3501: Too many open files error on kafka.

The open file limit was manually increased to stabilize the cluster:

# puppet agent --disable T3501
# diff -U3 /tmp/kafka.service kafka.service
--- /tmp/kafka.service	2021-08-25 07:32:28.068928972 +0000
+++ kafka.service	2021-08-25 07:32:31.384955246 +0000
@@ -15,7 +15,7 @@
 Environment='LOG_DIR=/var/log/kafka'
 Type=simple
 ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
-LimitNOFILE=65536
+LimitNOFILE=131072
Aug 25 2021, 9:43 AM · Journal, System administration
vsellier added a comment to T3501: Too many open files error on kafka.
  • Incident created on status.io
  • Loader disabled:
root@pergamon:~# clush -b -w @swh-workers 'puppet agent --disable "Kafka incident T3501"; systemctl stop cron; cd /etc/systemd/system/multi-user.target.wants; for unit in swh-worker@loader_*; do systemctl disable $unit; done; systemctl stop "swh-worker@loader_*"'
Aug 25 2021, 9:15 AM · Journal, System administration
vsellier changed the status of T3501: Too many open files error on kafka from Open to Work in Progress.
Aug 25 2021, 9:04 AM · Journal, System administration

Jun 11 2021

vsellier closed T3041: [production] Provision enough space for the search ES cluster to ingest all intrinsic metadata, a subtask of T3040: [production] Enable swh-search's journal-client for indexed objects, as Resolved.
Jun 11 2021, 10:23 AM · System administration, Journal, Archive search

Jun 8 2021

ardumont changed the status of T3041: [production] Provision enough space for the search ES cluster to ingest all intrinsic metadata, a subtask of T3040: [production] Enable swh-search's journal-client for indexed objects, from Open to Work in Progress.
Jun 8 2021, 4:48 PM · System administration, Journal, Archive search

May 4 2021

KShivendu added a comment to T3304: Kafka throws flush timeout error.

If you face this issue, try restarting the containers using docker-compose down and docker-compose up.

May 4 2021, 4:00 PM · Docker environment, Journal
vlorentz triaged T3304: Kafka throws flush timeout error as High priority.
May 4 2021, 12:24 PM · Docker environment, Journal
vlorentz edited projects for T3304: Kafka throws flush timeout error, added: Docker environment; removed Core Loader.
May 4 2021, 12:24 PM · Docker environment, Journal
KShivendu added projects to T3304: Kafka throws flush timeout error: Journal, Core Loader.
May 4 2021, 11:53 AM · Docker environment, Journal

Apr 21 2021

douardda added a comment to T3170: Revisions in the journal with out of range dates.

Note that none of their parent revisions can be found either in the archive (one invalid revision in a set of ingested revisions prevent any of them being inserted in the database I suppose, but they are already inserted in kafka at this moment).

Apr 21 2021, 7:08 PM · Data Model, Journal

Apr 20 2021

vlorentz added a comment to T2823: Write tests for swh/journal/writer/inmemory.py.

If we replaced the whole code with just this:

Apr 20 2021, 10:12 AM · Easy hack, Journal

Apr 19 2021

KShivendu added a comment to T2823: Write tests for swh/journal/writer/inmemory.py.

Do you some more tests or this task can be declared as resolved?

Apr 19 2021, 7:09 PM · Easy hack, Journal
olasd added a parent task for T2003: Content replayer may try to copy objects before they are available from an objstorage: T1954: Up-to-date objstorage mirror on S3.
Apr 19 2021, 12:07 PM · Journal
olasd closed T2003: Content replayer may try to copy objects before they are available from an objstorage as Resolved.

So D5246 has landed a while ago. The s3 object copy process has now caught up on some partitions and I can confirm that the copy of the latest added objects happens without any race condition.

Apr 19 2021, 12:06 PM · Journal

Apr 6 2021

vlorentz added a comment to T2823: Write tests for swh/journal/writer/inmemory.py.

Pass an object without `unique_key` and check it does raise an exception

Apr 6 2021, 12:43 PM · Easy hack, Journal

Apr 4 2021

KShivendu added a comment to T2823: Write tests for swh/journal/writer/inmemory.py.

Hey @vlorentz
How do I check https://forge.softwareheritage.org/source/swh-journal/browse/master/swh/journal/writer/inmemory.py$31. Do I have to pass dummy content, raw_extrinsic_metadata, origin_visit, et cetera as the object_ to write_addition function and before passing verify if they have unique_key function implemented ?

Apr 4 2021, 10:13 AM · Easy hack, Journal

Apr 1 2021

vsellier added a comment to T3191: journal-client: Add support of max message size configuration.

The journal client supports dynamic configuration via kwargs so no there is no need to improve it.

Apr 1 2021, 12:11 PM · Journal
vsellier closed T3191: journal-client: Add support of max message size configuration as Invalid.
Apr 1 2021, 12:11 PM · Journal
vsellier changed the status of T3191: journal-client: Add support of max message size configuration from Open to Work in Progress.
Apr 1 2021, 11:52 AM · Journal
vlorentz added a parent task for T2590: Finish the indexer -> swh-search pipeline: T1117: Origin search is *slow* when you look for very common words.
Apr 1 2021, 10:51 AM · Journal, Archive search

Mar 24 2021

seirl updated the task description for T3170: Revisions in the journal with out of range dates.
Mar 24 2021, 6:56 PM · Data Model, Journal
seirl updated the task description for T3170: Revisions in the journal with out of range dates.
Mar 24 2021, 4:11 PM · Data Model, Journal
seirl updated the task description for T3170: Revisions in the journal with out of range dates.
Mar 24 2021, 4:11 PM · Data Model, Journal
seirl updated the task description for T3170: Revisions in the journal with out of range dates.
Mar 24 2021, 4:10 PM · Data Model, Journal
seirl triaged T3170: Revisions in the journal with out of range dates as Normal priority.
Mar 24 2021, 1:13 PM · Data Model, Journal

Mar 15 2021

vlorentz added a revision to T2003: Content replayer may try to copy objects before they are available from an objstorage: D5246: content_add: Write to the objstorage before the DB or Kafka.
Mar 15 2021, 12:54 PM · Journal

Mar 5 2021

vsellier added a comment to T3083: Deploy swh-search v0.7.0/v0.7.1.

I forgot one step, cleaning the previous alias origin -> origin_production not needed anymore:

vsellier@search-esnode1 ~ % curl -s http://$ES_SERVER/_cat/indices\?v && echo && curl -s http://$ES_SERVER/_cat/aliases\?v && echo && curl -s http://$ES_SERVER/_cat/health\?v  
health status index             uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin-production hZfuv0lVRImjOjO_rYgDzg  90   1  153130652     26701625    273.4gb        137.3gb
Mar 5 2021, 10:45 AM · System administration, Journal, Archive search
ardumont added a comment to T3076: [swh-search] Improve the index/mapping migration process.

awesome

Mar 5 2021, 10:36 AM · System administration, Journal, Archive search
vsellier closed T3076: [swh-search] Improve the index/mapping migration process as Resolved.

The new configuration is deployed, swh-search is now using the alias which should help for the future upgrades

Mar 5 2021, 10:35 AM · System administration, Journal, Archive search
vsellier closed T3076: [swh-search] Improve the index/mapping migration process, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Mar 5 2021, 10:35 AM · Journal, Archive search
vsellier closed T3083: Deploy swh-search v0.7.0/v0.7.1, a subtask of T3076: [swh-search] Improve the index/mapping migration process, as Resolved.
Mar 5 2021, 10:34 AM · System administration, Journal, Archive search
vsellier closed T3083: Deploy swh-search v0.7.0/v0.7.1 as Resolved.
Mar 5 2021, 10:34 AM · System administration, Journal, Archive search
vsellier added a comment to T3083: Deploy swh-search v0.7.0/v0.7.1.

Deployment in production:

  • puppet stopped
  • configuration updated to declare the index, it needs to be done to make swh-search initializing the aliaes before the journal clients starts (not guaranteed with a puppet apply)
  • package updated
  • gunicorn-swh-search service restarted:
Mar 05 09:08:46 search1 python3[1881743]: 2021-03-05 09:08:46 [1881743] gunicorn.error:INFO Starting gunicorn 19.9.0
Mar 05 09:08:46 search1 python3[1881743]: 2021-03-05 09:08:46 [1881743] gunicorn.error:INFO Listening at: unix:/run/gunicorn/swh-search/gunicorn.sock (1881743)
Mar 05 09:08:46 search1 python3[1881743]: 2021-03-05 09:08:46 [1881743] gunicorn.error:INFO Using worker: sync
Mar 05 09:08:46 search1 python3[1881748]: 2021-03-05 09:08:46 [1881748] gunicorn.error:INFO Booting worker with pid: 1881748
Mar 05 09:08:46 search1 python3[1881749]: 2021-03-05 09:08:46 [1881749] gunicorn.error:INFO Booting worker with pid: 1881749
Mar 05 09:08:46 search1 python3[1881750]: 2021-03-05 09:08:46 [1881750] gunicorn.error:INFO Booting worker with pid: 1881750
Mar 05 09:08:46 search1 python3[1881751]: 2021-03-05 09:08:46 [1881751] gunicorn.error:INFO Booting worker with pid: 1881751
Mar 05 09:08:53 search1 python3[1881750]: 2021-03-05 09:08:53 [1881750] swh.search.api.server:INFO Initializing indexes with configuration: 
Mar 05 09:08:53 search1 python3[1881750]: 2021-03-05 09:08:53 [1881750] elasticsearch:INFO HEAD http://search-esnode2.internal.softwareheritage.org:9200/origin-production [status:200 request:0.023s]
Mar 05 09:08:54 search1 python3[1881750]: 2021-03-05 09:08:54 [1881750] elasticsearch:INFO PUT http://search-esnode1.internal.softwareheritage.org:9200/origin-production/_alias/origin-read [status:200 request:0.487s]
Mar 05 09:08:54 search1 python3[1881750]: 2021-03-05 09:08:54 [1881750] elasticsearch:INFO PUT http://search-esnode3.internal.softwareheritage.org:9200/origin-production/_alias/origin-write [status:200 request:0.152s]
Mar 05 09:08:54 search1 python3[1881750]: 2021-03-05 09:08:54 [1881750] elasticsearch:INFO PUT http://search-esnode1.internal.softwareheritage.org:9200/origin-production/_mapping [status:200 request:0.009s]
vsellier@search-esnode1 ~ % curl -s http://$ES_SERVER/_cat/indices\?v && echo && curl -s http://$ES_SERVER/_cat/aliases\?v && echo && curl -s http://$ES_SERVER/_cat/health\?v 
health status index             uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   origin-production hZfuv0lVRImjOjO_rYgDzg  90   1  153097672    144224208    288.1gb          149gb
Mar 5 2021, 10:34 AM · System administration, Journal, Archive search