thanks
LGTM
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Sep 3 2021
- puppet configuration deployed in staging
- read index updated with this script:
#!/bin/bash
The lag has recovered in ~ 12hours.
The content of the index looks goods (just cherry picked a couple of origin).
Sep 1 2021
- package python3-swh.search upgraded to version 0.11.4-2, the problem is fixed
- the new index is well created:
root@search0:/# curl -s http://search-esnode0:9200/_cat/indices\?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open origin-v0.11 HljzsdD9SmKI7-8ekB_q3Q 80 0 0 0 4.2kb 4.2kb green close origin HthJj42xT5uO7w3Aoxzppw 80 0 green close origin-v0.9.0 o7FiYJWnTkOViKiAdCXCuA 80 0 green open origin-v0.10.0 -fvf4hK9QDeN8qYTJBBlxQ 80 0 1981623 559384 2.3gb 2.3gb green close origin-backup-20210209-1736 P1CKjXW0QiWM5zlzX46-fg 80 0 green close origin-v0.5.0 SGplSaqPR_O9cPYU4ZsmdQ 80 0
- journal clients enabled and restarted
- the journal clients lags should recover in less than 12h
- waiting some time to estimate the duration with only one journal client per type
The problem was fixed by rDSEA68347a5604c74150197f691593cbb05bdd34396f
thanks @olasd
Deployment of version v0.11.4 in staging:
On search0:
- puppet stopped
- stop and disable the journal clients and search backend
- update the swh-search configuration to use a origin-v0.11 index
root@search0:/etc/softwareheritage/search# diff -U2 /tmp/server.yml server.yml --- /tmp/server.yml 2021-09-01 13:42:29.347951302 +0000 +++ server.yml 2021-09-01 13:42:35.739953523 +0000 @@ -7,5 +7,5 @@ indexes: origin: - index: origin-v0.10.0 + index: origin-v0.11 read_alias: origin-read write_alias: origin-write
- update the journal-clients to use a group id swh.search.journal_client.[indexed|object]-v0.11
root@search0:/etc/softwareheritage/search# diff -U3 /tmp/journal_client_objects.yml journal_client_objects.yml --- /tmp/journal_client_objects.yml 2021-09-01 13:44:49.843999978 +0000 +++ journal_client_objects.yml 2021-09-01 13:45:03.972004852 +0000 @@ -5,7 +5,7 @@ journal: brokers: - journal0.internal.staging.swh.network - group_id: swh.search.journal_client-v0.10.0 + group_id: swh.search.journal_client-v0.11 prefix: swh.journal.objects object_types: - origin root@search0:/etc/softwareheritage/search# diff -U3 /tmp/journal_client_indexed.yml journal_client_indexed.yml --- /tmp/journal_client_indexed.yml 2021-09-01 13:44:44.847998252 +0000 +++ journal_client_indexed.yml 2021-09-01 13:44:57.020002454 +0000 @@ -5,7 +5,7 @@ journal: brokers: - journal0.internal.staging.swh.network - group_id: swh.search.journal_client.indexed-v0.10.0 + group_id: swh.search.journal_client.indexed-v0.11 prefix: swh.journal.indexed object_types: - origin_intrinsic_metadata
- perform a system upgrade, a reboot was not required
- enable and start swh-search backend
- An error occurs after the restart:
Sep 01 14:19:12 search0 python3[4066688]: 2021-09-01 14:19:12 [4066688] root:ERROR command 'cc' failed with exit status 1 Traceback (most recent call last): File "/usr/lib/python3.7/distutils/unixccompiler.py", line 118, in _compile extra_postargs) File "/usr/lib/python3.7/distutils/ccompiler.py", line 909, in spawn spawn(cmd, dry_run=self.dry_run) File "/usr/lib/python3.7/distutils/spawn.py", line 36, in spawn _spawn_posix(cmd, search_path, dry_run=dry_run) File "/usr/lib/python3.7/distutils/spawn.py", line 159, in _spawn_posix % (cmd, exit_status)) distutils.errors.DistutilsExecError: command 'cc' failed with exit status 1
The build is now fixed and the v0.11.4 version is ready to be deployed on the environments
Test with 10 replayers with the 3 kind of algorithm:
- first interval: one-by-one
- second interval: concurremt
- third interval: batch:
LGTM
LGTM
Aug 31 2021
Aug 30 2021
rebase
Add a failure without the correction in the tests
Aug 27 2021
New cluster state after all the reservation are up:
vsellier@gros-50:~$ nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 172.16.97.3 1.4 TiB 256 60.1% a3ae5fa2-c063-4890-87f1-bddfcf293bde rack1 UN 172.16.97.6 1.4 TiB 256 60.0% bfe360f1-8fd2-4f4b-a070-8f267eda1e12 rack1 UN 172.16.97.5 1.39 TiB 256 59.9% 478c36f8-5220-4db7-b5c2-f3876c0c264a rack1 UN 172.16.97.4 1.4 TiB 256 59.9% b3105348-66b0-4f82-a5bf-31ef28097a41 rack1 UN 172.16.97.2 1.4 TiB 256 60.1% de866efd-064c-4e27-965c-f5112393dc8f rack1
- cassandra stopped
vsellier@fnancy:~/cassandra$ seq 50 64 | parallel -t ssh root@gros-{} systemctl stop cassandra
- data cleaned
vsellier@fnancy:~/cassandra$ seq 50 64 | parallel -t ssh root@gros-{} "rm -rf /srv/cassandra/*"
- Cassandra restarted
vsellier@fnancy:~/cassandra$ seq 50 64 | parallel -t ssh root@gros-{} systemctl start cassandra
well after reflection, it will be probably faster to recreate the second DC from scractch now the configuration is ready.
5 nodes were added on the cluster:
- configuration pushed on g5k, disk reserved for 14 days on the new servers, a new reservation was launched with the new nodes
- each node was started one by one after their status was UN on the nodetool status output
Datacenter: DC1 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack DN 172.16.97.3 ? 256 0.0% a3ae5fa2-c063-4890-87f1-bddfcf293bde r1 DN 172.16.97.6 ? 256 0.0% bfe360f1-8fd2-4f4b-a070-8f267eda1e12 r1 DN 172.16.97.5 ? 256 0.0% 478c36f8-5220-4db7-b5c2-f3876c0c264a r1 DN 172.16.97.4 ? 256 0.0% b3105348-66b0-4f82-a5bf-31ef28097a41 r1 DN 172.16.97.2 ? 256 0.0% de866efd-064c-4e27-965c-f5112393dc8f r1
10 nodes are not enough, I add 5 additional nodes to reduce the volume per node a little.
thanks. I will test that once the monitoring is updated to use the statsd statistics instead of the object_count table content.
the lz4 compression was already activated by default. Changing the algo to zstd on the table snapshot was not really significant (initially with lz4: 7Go, zstd: 12Go, go back to lz4: 9Go :) )
interesting:
Depending on the data characteristics of the table, compressing its data can result in:
25-33% reduction in data size 25-35% performance improvement on reads 5-10% performance improvement on writes
The replaying is currently stopped as the data disks are now almost full.
I will try to activate the compression on some big tables to see if it can help.
I will probably need to start on small tables to recover some space before being able to compress the biggest tables
Aug 26 2021
The patch was test in a loader and in the replayers.
The difference was not really significant on the loader but I'm not really confident in the tests as the cluster had a pretty high load (running replayers + second datacenter synchronization).
I will retry with a more quieter environment to be able to isolate the loader behavior.
These are the steps done to initialized the new cluster [1]:
- add a file datacenter-rackdc.properties on the server with the according DC
gros-50:~$ cat /etc/cassandra/cassandra-rackdc.properties dc=datacenter2 rack=rack1
- change the value of the properties endpoint_snitch from SimpleSnitch to GossipingPropertyFileSnitch [2].
The recommanded value for production is GossipingPropertyFileSnitch so it should have been this since the beginning
- configure the disk_optimization_strategy to ssd on the new datacenter
- update the seed_provider to have one node on each datacenter
- restart the datacenter1 nodes to apply the new configuration
- start the datacenter2 nodes one by one, wait until the status of the node is UN (Up and Normal) before starting another one (They can be stay in the UJ (joining) state for a couple of minutes)
- when done, update the swh keyspace to declare the replication strategy of the second DC
ALTER KEYSPACE swh WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3, 'datacenter2': 3};
The replication of the new changes starts here but the full table contents need to be copied
- rebuild the cluster content:
vsellier@fnancy:~/cassandra$ seq 0 9 | parallel -t ssh gros-5{} nodetool rebuild -ks swh -- datacenter1
The progression can be monitored with nodetool command:
gros-50:~$ nodetool netstats Mode: NORMAL Rebuild e5e64920-0644-11ec-92a6-31a241f39914 /172.16.97.4 Receiving 199 files, 147926499702 bytes total. Already received 125 files (62.81%), 57339885570 bytes total (38.76%) swh/release-4 1082347/1082347 bytes (100%) received from idx:0/172.16.97.4 swh/content_by_blake2s256-2 3729362955/3729362955 bytes (100%) received from idx:0/172.16.97.4 swh/release-3 224510803/224510803 bytes (100%) received from idx:0/172.16.97.4 swh/content_by_blake2s256-1 240283216/240283216 bytes (100%) received from idx:0/172.16.97.4 swh/content_by_blake2s256-4 29491504/29491504 bytes (100%) received from idx:0/172.16.97.4 swh/release-2 6409474/6409474 bytes (100%) received from idx:0/172.16.97.4 ... Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool Name Active Pending Completed Dropped Large messages n/a 0 23 0 Small messages n/a 3 132753939 0 Gossip messages n/a 0 43915 0
or to filter only running transfers:
gros-50:~$ nodetool netstats | grep -v 100% Mode: NORMAL Rebuild e5e64920-0644-11ec-92a6-31a241f39914 /172.16.97.4 Receiving 199 files, 147926499702 bytes total. Already received 125 files (62.81%), 57557961160 bytes total (38.91%) swh/directory_entry-7 4819168032/4925484261 bytes (97%) received from idx:0/172.16.97.4 /172.16.97.2 Receiving 202 files, 111435975646 bytes total. Already received 139 files (68.81%), 60583670773 bytes total (54.37%) swh/directory_entry-12 1631210003/2906113367 bytes (56%) received from idx:0/172.16.97.2 /172.16.97.6 Receiving 236 files, 186694443984 bytes total. Already received 142 files (60.17%), 58869656747 bytes total (31.53%) swh/snapshot_branch-10 4449235102/7845572885 bytes (56%) received from idx:0/172.16.97.6 /172.16.97.5 Receiving 221 files, 143384473640 bytes total. Already received 132 files (59.73%), 58300913015 bytes total (40.66%) swh/directory_entry-4 982247023/3492851311 bytes (28%) received from idx:0/172.16.97.5 Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool Name Active Pending Completed Dropped Large messages n/a 0 23 0 Small messages n/a 2 135087921 0 Gossip messages n/a 0 44176 0
The second cassandra cluster is finally up and synchronizing with the first one. The rebuild should be done by the end of the day or tomorrow.
The backfill is also done for the production.
It tooks less than 4h30
... 2021-08-25T19:25:25 INFO swh.storage.backfill Processing extid range 700000 to 700001
LGTM
Aug 25 2021
It was really faster than expected in staging. The backfilling is already done:
- on production:
vsellier@kafka1 ~ % /opt/kafka/bin/kafka-topics.sh --bootstrap-server $SERVER --describe --topic swh.journal.objects.extid | grep "^Topic" Topic: swh.journal.objects.extid PartitionCount: 256 ReplicationFactor: 2 Configs: cleanup.policy=compact,max.message.bytes=104857600 vsellier@kafka1 ~ % /opt/kafka/bin/kafka-configs.sh --bootstrap-server ${SERVER} --alter --add-config 'cleanup.policy=[compact,delete],retention.ms=86400000' --entity-type=topics --entity-name swh.journal.objects.extid Completed updating config for topic swh.journal.objects.extid.
In the kafka logs:
... [2021-08-25 14:56:19,495] INFO [Log partition=swh.journal.objects.extid-162, dir=/srv/kafka/logdir] Found deletable segments with base offsets [0] due to retention time 86400000ms breach (kafka.log.Log) [2021-08-25 14:56:19,495] INFO [Log partition=swh.journal.objects.extid-162, dir=/srv/kafka/logdir] Scheduling segments for deletion LogSegment(baseOffset=0, size=2720767, lastModifiedTime=1629815520833, largestTime=1629815520702) (kafka.log.Log) [2021-08-25 14:56:19,495] INFO [Log partition=swh.journal.objects.extid-162, dir=/srv/kafka/logdir] Incremented log start offset to 20623 due to segment deletion (kafka.log.Log) ....
vsellier@kafka1 ~ % /opt/kafka/bin/kafka-configs.sh --bootstrap-server ${SERVER} --alter --delete-config 'cleanup.policy' --entity-type=topics --entity-name swh.journal.objects.extid Completed updating config for topic swh.journal.objects.extid. vsellier@kafka1 ~ % /opt/kafka/bin/kafka-configs.sh --bootstrap-server ${SERVER} --alter --delete-config 'retention.ms' --entity-type=topics --entity-name swh.journal.objects.extid Completed updating config for topic swh.journal.objects.extid. vsellier@kafka1 ~ % /opt/kafka/bin/kafka-configs.sh --bootstrap-server ${SERVER} --alter --add-config 'cleanup.policy=compact' --entity-type=topics --entity-name swh.journal.objects.extid Completed updating config for topic swh.journal.objects.extid. vsellier@kafka1 ~ % /opt/kafka/bin/kafka-topics.sh --bootstrap-server $SERVER --describe --topic swh.journal.objects.extid | grep "^Topic" Topic: swh.journal.objects.extid PartitionCount: 256 ReplicationFactor: 2 Configs: cleanup.policy=compact,max.message.bytes=104857600
- the retention policy was restore to compact on staging:
vsellier@journal0 ~ % /opt/kafka/bin/kafka-configs.sh --bootstrap-server journal0.internal.staging.swh.network:9092 --alter --delete-config 'cleanup.policy' --entity-type=topics --entity-name swh.journal.objects.extid
% /opt/kafka/bin/kafka-configs.sh --bootstrap-server journal0.internal.staging.swh.network:9092 --alter --add-config 'cleanup.policy=compact' --entity-type=topics --entity-name swh.journal.objects.extid Completed updating config for topic swh.journal.objects.extid.
% /opt/kafka/bin/kafka-topics.sh --bootstrap-server $SERVER --describe --topic swh.journal.objects.extid | grep "^Topic" Topic: swh.journal.objects.extid PartitionCount: 64 ReplicationFactor: 1 Configs: cleanup.policy=compact,max.message.bytes=104857600,min.cleanable.dirty.ratio=0.01
status.io incident closed
Save code now requests rescheduled:
swh-web=> select * from save_origin_request where loading_task_status='scheduled' limit 100; ... <output loast due to the psql pager :( ...
softwareheritage-scheduler=> select * from task where id in (398244739, 398244740, 398244742, 398244744, 398244745, 398244748, 398095676, 397470401, 397470402, 397470404, 397470399);
few minutes later:
swh-web=> select * from save_origin_request where loading_task_status='scheduled' limit 100; id | request_date | visit_type | origin_url | status | loading_task_id | visit_date | loading_task_status | visit_status | user_ids ----+--------------+------------+------------+--------+-----------------+------------+---------------------+--------------+---------- (0 rows)
- all the workers are restarted
- Several save code now requests look stuck in the scheduled status, currently looking how to unblock them
D6130 landed and applied one kafka at a time
ok roger that :).
I will increase to 524288 in the diff
all the loaders are restarted on worker01 and workers02, it seems the cluster is ok.
The open file limit was manually increased to stabilize the cluster:
# puppet agent --disable T3501 # diff -U3 /tmp/kafka.service kafka.service --- /tmp/kafka.service 2021-08-25 07:32:28.068928972 +0000 +++ kafka.service 2021-08-25 07:32:31.384955246 +0000 @@ -15,7 +15,7 @@ Environment='LOG_DIR=/var/log/kafka' Type=simple ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties -LimitNOFILE=65536 +LimitNOFILE=131072
- Incident created on status.io
- Loader disabled:
root@pergamon:~# clush -b -w @swh-workers 'puppet agent --disable "Kafka incident T3501"; systemctl stop cron; cd /etc/systemd/system/multi-user.target.wants; for unit in swh-worker@loader_*; do systemctl disable $unit; done; systemctl stop "swh-worker@loader_*"'
Aug 24 2021
Some live data from a git loader with a batch size of 1000 for each object types (with D6118 applied):
"object type";"input count";"missing_id duration (s)";"_missing_id count","_add duration(s)" content;1000;0.4928;999;35.3384 content;1000;0.4095;1000;34.1440 content;1000;0.4374;998;35.6249 content;492;0.2960;488;16.7028 directory;1000;0.3978;999;71.2518 directory;1000;0.4484;1000;39.6845 directory;1000;0.4356;1000;54.0077 directory;1000;0.3833;1000;36.1437 directory;1000;0.4319;1000;30.5690 directory;402;0.1718;402;19.2335 revision;1000;0.8671;1000;10.3417 revision;575;0.4639;575;4.0819
The performance are ok now for the read part with a batch size of 1000 for content, directory and revision.
An alert was sent by email the 2021-05-22 at 05:30 AM so the monitoring has well detected the issue ;) :
This message was generated by the smartd daemon running on:
on hypervisor3 and branly
- A new lvm volume was created and mounted on /var/lib/vz (40G on hypervisor3 / 100G on branly)
- local storage type was activated on proxmox via the ui (Datacenter / storage / local, check enable)
- pushkin and glytotek disks moved via to ui to the local storage (<vm> / hardware click on the disk / move disk button / target storage 'local')
LGTM (double checked with @olasd ;) )
Aug 23 2021
It seems the problem is no longer present now (tested with several origins)
root@parasilo-19:~/swh-environment/docker# docker exec -ti docker_swh-loader_1 bash swh@8e68948366b7:/$ swh loader run git https://github.com/slackhq/nebula INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/slackhq/nebula' with type 'git' INFO:swh.loader.git.loader.GitLoader:Listed 293 refs for repo https://github.com/slackhq/nebula {'status': 'uneventful'} swh@8e68948366b7:/$ swh loader run git https://github.com/slackhq/nebula INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/slackhq/nebula' with type 'git' INFO:swh.loader.git.loader.GitLoader:Listed 293 refs for repo https://github.com/slackhq/nebula {'status': 'uneventful'} swh@8e68948366b7:/$ swh loader run git https://github.com/slackhq/nebula INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/slackhq/nebula' with type 'git' INFO:swh.loader.git.loader.GitLoader:Listed 293 refs for repo https://github.com/slackhq/nebula {'status': 'uneventful'}
The origin_visit topic was replayed with your diff during the weekend. let's test now if the worker behavior is more deterministic
Aug 20 2021
Aug 19 2021
The gros cluster at Nancy[1] has a lot of nodes(124) with small reservable SSD of 960Go. This can be a good candidate to create the second cluster. It will also allow to check the performance with data (and commit logs) on SSDs.
According to the main cluster, a minimum of 8 nodes are necessary to handle the volume of data (7.3 To and growing). Starting with 10 nodes will allow to have some remaining space.
it seems some more precise information can be logged by activating the full query logs without a big performance impact: https://cassandra.apache.org/doc/latest/cassandra/new/fqllogging.html
Should be fixed by T3482