There is no azure kafka cluster anymore...
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Nov 22 2021
Sep 8 2021
metadata searches are now done in Elasticsearch since the deployment of T3433
Sep 3 2021
Aug 30 2021
Aug 26 2021
I think so, thanks
@vlorentz should we close this one?
Aug 25 2021
status.io incident closed
Save code now requests rescheduled:
swh-web=> select * from save_origin_request where loading_task_status='scheduled' limit 100; ... <output loast due to the psql pager :( ...
softwareheritage-scheduler=> select * from task where id in (398244739, 398244740, 398244742, 398244744, 398244745, 398244748, 398095676, 397470401, 397470402, 397470404, 397470399);
few minutes later:
swh-web=> select * from save_origin_request where loading_task_status='scheduled' limit 100; id | request_date | visit_type | origin_url | status | loading_task_id | visit_date | loading_task_status | visit_status | user_ids ----+--------------+------------+------------+--------+-----------------+------------+---------------------+--------------+---------- (0 rows)
- all the workers are restarted
- Several save code now requests look stuck in the scheduled status, currently looking how to unblock them
D6130 landed and applied one kafka at a time
ok roger that :).
I will increase to 524288 in the diff
The kafka servers are only running kafka and zookeeper, so the limit of open files isn't that critical. I think we can bump the limit more substantially than just x2 (maybe go directly with x8?), as I expect we'll still be adding more topics in the future.
all the loaders are restarted on worker01 and workers02, it seems the cluster is ok.
The open file limit was manually increased to stabilize the cluster:
# puppet agent --disable T3501 # diff -U3 /tmp/kafka.service kafka.service --- /tmp/kafka.service 2021-08-25 07:32:28.068928972 +0000 +++ kafka.service 2021-08-25 07:32:31.384955246 +0000 @@ -15,7 +15,7 @@ Environment='LOG_DIR=/var/log/kafka' Type=simple ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties -LimitNOFILE=65536 +LimitNOFILE=131072
- Incident created on status.io
- Loader disabled:
root@pergamon:~# clush -b -w @swh-workers 'puppet agent --disable "Kafka incident T3501"; systemctl stop cron; cd /etc/systemd/system/multi-user.target.wants; for unit in swh-worker@loader_*; do systemctl disable $unit; done; systemctl stop "swh-worker@loader_*"'
Jun 11 2021
Jun 8 2021
May 4 2021
If you face this issue, try restarting the containers using docker-compose down and docker-compose up.
Apr 21 2021
Note that none of their parent revisions can be found either in the archive (one invalid revision in a set of ingested revisions prevent any of them being inserted in the database I suppose, but they are already inserted in kafka at this moment).
Apr 20 2021
If we replaced the whole code with just this:
Apr 19 2021
Do you some more tests or this task can be declared as resolved?
So D5246 has landed a while ago. The s3 object copy process has now caught up on some partitions and I can confirm that the copy of the latest added objects happens without any race condition.
Apr 6 2021
Pass an object without `unique_key` and check it does raise an exception
Apr 4 2021
Hey @vlorentz
How do I check https://forge.softwareheritage.org/source/swh-journal/browse/master/swh/journal/writer/inmemory.py$31. Do I have to pass dummy content, raw_extrinsic_metadata, origin_visit, et cetera as the object_ to write_addition function and before passing verify if they have unique_key function implemented ?
Apr 1 2021
The journal client supports dynamic configuration via kwargs so no there is no need to improve it.
Mar 24 2021
Mar 15 2021
Mar 5 2021
I forgot one step, cleaning the previous alias origin -> origin_production not needed anymore:
vsellier@search-esnode1 ~ % curl -s http://$ES_SERVER/_cat/indices\?v && echo && curl -s http://$ES_SERVER/_cat/aliases\?v && echo && curl -s http://$ES_SERVER/_cat/health\?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open origin-production hZfuv0lVRImjOjO_rYgDzg 90 1 153130652 26701625 273.4gb 137.3gb
awesome
The new configuration is deployed, swh-search is now using the alias which should help for the future upgrades
Deployment in production:
- puppet stopped
- configuration updated to declare the index, it needs to be done to make swh-search initializing the aliaes before the journal clients starts (not guaranteed with a puppet apply)
- package updated
- gunicorn-swh-search service restarted:
Mar 05 09:08:46 search1 python3[1881743]: 2021-03-05 09:08:46 [1881743] gunicorn.error:INFO Starting gunicorn 19.9.0 Mar 05 09:08:46 search1 python3[1881743]: 2021-03-05 09:08:46 [1881743] gunicorn.error:INFO Listening at: unix:/run/gunicorn/swh-search/gunicorn.sock (1881743) Mar 05 09:08:46 search1 python3[1881743]: 2021-03-05 09:08:46 [1881743] gunicorn.error:INFO Using worker: sync Mar 05 09:08:46 search1 python3[1881748]: 2021-03-05 09:08:46 [1881748] gunicorn.error:INFO Booting worker with pid: 1881748 Mar 05 09:08:46 search1 python3[1881749]: 2021-03-05 09:08:46 [1881749] gunicorn.error:INFO Booting worker with pid: 1881749 Mar 05 09:08:46 search1 python3[1881750]: 2021-03-05 09:08:46 [1881750] gunicorn.error:INFO Booting worker with pid: 1881750 Mar 05 09:08:46 search1 python3[1881751]: 2021-03-05 09:08:46 [1881751] gunicorn.error:INFO Booting worker with pid: 1881751 Mar 05 09:08:53 search1 python3[1881750]: 2021-03-05 09:08:53 [1881750] swh.search.api.server:INFO Initializing indexes with configuration: Mar 05 09:08:53 search1 python3[1881750]: 2021-03-05 09:08:53 [1881750] elasticsearch:INFO HEAD http://search-esnode2.internal.softwareheritage.org:9200/origin-production [status:200 request:0.023s] Mar 05 09:08:54 search1 python3[1881750]: 2021-03-05 09:08:54 [1881750] elasticsearch:INFO PUT http://search-esnode1.internal.softwareheritage.org:9200/origin-production/_alias/origin-read [status:200 request:0.487s] Mar 05 09:08:54 search1 python3[1881750]: 2021-03-05 09:08:54 [1881750] elasticsearch:INFO PUT http://search-esnode3.internal.softwareheritage.org:9200/origin-production/_alias/origin-write [status:200 request:0.152s] Mar 05 09:08:54 search1 python3[1881750]: 2021-03-05 09:08:54 [1881750] elasticsearch:INFO PUT http://search-esnode1.internal.softwareheritage.org:9200/origin-production/_mapping [status:200 request:0.009s]
vsellier@search-esnode1 ~ % curl -s http://$ES_SERVER/_cat/indices\?v && echo && curl -s http://$ES_SERVER/_cat/aliases\?v && echo && curl -s http://$ES_SERVER/_cat/health\?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open origin-production hZfuv0lVRImjOjO_rYgDzg 90 1 153097672 144224208 288.1gb 149gb
Mar 4 2021
swh-search:v0.7.1 deployed in staging according to the defined plan.
The aliases are well created and used by the services
vsellier@search-esnode0 ~ % curl -XGET -H "Content-Type: application/json" http://192.168.130.80:9200/_cat/indices green open origin HthJj42xT5uO7w3Aoxzppw 80 0 929692 137147 4gb 4gb green close origin-backup-20210209-1736 P1CKjXW0QiWM5zlzX46-fg 80 0 green close origin-v0.5.0 SGplSaqPR_O9cPYU4ZsmdQ 80 0 vsellier@search-esnode0 ~ % curl -XGET -H "Content-Type: application/json" http://192.168.130.80:9200/_cat/aliases origin-read origin - - - - origin-write origin - - - -
Journal clients:
Mar 04 16:22:40 search0 swh[3598137]: INFO:elasticsearch:POST http://search-esnode0.internal.staging.swh.network:9200/origin-write/_bulk [status:200 request:0.013s] Mar 04 16:22:41 search0 swh[3598137]: INFO:elasticsearch:POST http://search-esnode0.internal.staging.swh.network:9200/origin-write/_bulk [status:200 request:0.012s]
Search:
Mar 04 15:40:20 search0 python3[3598040]: 2021-03-04 15:40:20 [3598040] swh.search.api.server:INFO Initializing indexes with configuration: Mar 04 15:40:20 search0 python3[3598040]: 2021-03-04 15:40:20 [3598040] elasticsearch:INFO HEAD http://search-esnode0.internal.staging.swh.network:9200/origin [status:200 request:0.005s] Mar 04 15:40:20 search0 python3[3598040]: 2021-03-04 15:40:20 [3598040] elasticsearch:INFO HEAD http://search-esnode0.internal.staging.swh.network:9200/origin-read/_alias [status:200 request:0.001s] Mar 04 15:40:20 search0 python3[3598040]: 2021-03-04 15:40:20 [3598040] elasticsearch:INFO HEAD http://search-esnode0.internal.staging.swh.network:9200/origin-write/_alias [status:200 request:0.001s] Mar 04 15:40:20 search0 python3[3598040]: 2021-03-04 15:40:20 [3598040] elasticsearch:INFO PUT http://search-esnode0.internal.staging.swh.network:9200/origin/_mapping [status:200 request:0.006s] Mar 04 16:19:27 search0 python3[3598042]: 2021-03-04 16:19:27 [3598042] elasticsearch:INFO GET http://search-esnode0.internal.staging.swh.network:9200/origin-read/_search?size=100 [status:200 request:0.076s]
Mar 3 2021
Mar 2 2021
Mar 1 2021
Feb 18 2021
Feb 17 2021
Feb 15 2021
Feb 12 2021
Feb 11 2021
Done scheduling:
T3041 needs to be done before this one (for the production environment)
Running:
swhscheduler@saatchi:~$ /usr/bin/swh scheduler --config-file /etc/softwareheritage/scheduler/backend.yml task schedule_ origins --storage-url http://saam.internal.softwareheritage.org:5002 --batch-size 20 index-origin-metadata | tee /tmp/schedule-origins.txt
@ardumont no, OriginMetadataIndexer lacks a filter step.
@ardumont no, OriginMetadataIndexer lacks a filter step.
Although, now i'm wondering something.
Is that enough to write what's not in the topics?
Ah no! I misused the cli, with the right flags:
This needs a storage access so edit a dedicated configuration file.
That's it! [1]
Feb 10 2021
try swh scheduler task schedule_origins
That suggested cli does not show up but i've only took a quick glance ¯\_(ツ)_/¯:
Rather than write code to read from the database to kafka (like we did with swh-storage), this can be done simply by re-indexing all the origins, using swh scheduler schedule_origins
Indexer related topics status can be seen in in the indexer ingestion status board [1]
We'll prepare the topics with the following first and we'll improve later if need be: