Apr 29 2022
Fixed in D7718
Apr 27 2022
No longer happens with a more recent stack
Mar 3 2022
Feb 7 2022
Feb 3 2022
Looks like we are going to keep the status quo in the short term, ie. a numeric offset for old objects, and offset_bytes for new objects without renaming.
Jan 27 2022
yeah!
See T3893 instead.
Jan 26 2022
Nov 22 2021
There is no azure kafka cluster anymore...
Sep 8 2021
metadata searches are now done in Elasticsearch since the deployment of T3433
Sep 3 2021
Aug 30 2021
Aug 26 2021
I think so, thanks
@vlorentz should we close this one?
Aug 25 2021
status.io incident closed
Save code now requests rescheduled:
swh-web=> select * from save_origin_request where loading_task_status='scheduled' limit 100; ... <output loast due to the psql pager :( ...
softwareheritage-scheduler=> select * from task where id in (398244739, 398244740, 398244742, 398244744, 398244745, 398244748, 398095676, 397470401, 397470402, 397470404, 397470399);
few minutes later:
swh-web=> select * from save_origin_request where loading_task_status='scheduled' limit 100; id | request_date | visit_type | origin_url | status | loading_task_id | visit_date | loading_task_status | visit_status | user_ids ----+--------------+------------+------------+--------+-----------------+------------+---------------------+--------------+---------- (0 rows)
- all the workers are restarted
- Several save code now requests look stuck in the scheduled status, currently looking how to unblock them
D6130 landed and applied one kafka at a time
ok roger that :).
I will increase to 524288 in the diff
The kafka servers are only running kafka and zookeeper, so the limit of open files isn't that critical. I think we can bump the limit more substantially than just x2 (maybe go directly with x8?), as I expect we'll still be adding more topics in the future.
all the loaders are restarted on worker01 and workers02, it seems the cluster is ok.
The open file limit was manually increased to stabilize the cluster:
# puppet agent --disable T3501 # diff -U3 /tmp/kafka.service kafka.service --- /tmp/kafka.service 2021-08-25 07:32:28.068928972 +0000 +++ kafka.service 2021-08-25 07:32:31.384955246 +0000 @@ -15,7 +15,7 @@ Environment='LOG_DIR=/var/log/kafka' Type=simple ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties -LimitNOFILE=65536 +LimitNOFILE=131072
- Incident created on status.io
- Loader disabled:
root@pergamon:~# clush -b -w @swh-workers 'puppet agent --disable "Kafka incident T3501"; systemctl stop cron; cd /etc/systemd/system/multi-user.target.wants; for unit in swh-worker@loader_*; do systemctl disable $unit; done; systemctl stop "swh-worker@loader_*"'
Jun 11 2021
Jun 8 2021
May 4 2021
If you face this issue, try restarting the containers using docker-compose down and docker-compose up.
Apr 21 2021
Note that none of their parent revisions can be found either in the archive (one invalid revision in a set of ingested revisions prevent any of them being inserted in the database I suppose, but they are already inserted in kafka at this moment).
Apr 20 2021
If we replaced the whole code with just this:
Apr 19 2021
Do you some more tests or this task can be declared as resolved?
So D5246 has landed a while ago. The s3 object copy process has now caught up on some partitions and I can confirm that the copy of the latest added objects happens without any race condition.
Apr 6 2021
Pass an object without `unique_key` and check it does raise an exception
Apr 4 2021
Hey @vlorentz
How do I check https://forge.softwareheritage.org/source/swh-journal/browse/master/swh/journal/writer/inmemory.py$31. Do I have to pass dummy content, raw_extrinsic_metadata, origin_visit, et cetera as the object_ to write_addition function and before passing verify if they have unique_key function implemented ?
Apr 1 2021
The journal client supports dynamic configuration via kwargs so no there is no need to improve it.