- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
All Stories
Aug 27 2021
I agree, it looks great!
Note that the cache invalidation is not completely done though as the objstorage used
is an azure one.Currently investigating how to clean that up.
interesting:
Depending on the data characteristics of the table, compressing its data can result in:
25-33% reduction in data size 25-35% performance improvement on reads 5-10% performance improvement on writes
The replaying is currently stopped as the data disks are now almost full.
I will try to activate the compression on some big tables to see if it can help.
I will probably need to start on small tables to recover some space before being able to compress the biggest tables
Note that the cache invalidation is not completely done though as the objstorage used
is an azure one.
- status.io: Open maintenance ticket to notify of the partial disruption in service
- vangogh: Stop puppet
- vangogh: Stop gunicorn-swh-vault
- vault db: Schema migration [1]
- Upgrade workers and webapp nodes with latest swh.vault and restart cooker service
- Start back gunicorn-swh-vault
- Try a cooking and check result -> ok
- Close maintenance ticket as everything is fine
Aug 26 2021
This looks great!
The patch was test in a loader and in the replayers.
The difference was not really significant on the loader but I'm not really confident in the tests as the cluster had a pretty high load (running replayers + second datacenter synchronization).
I will retry with a more quieter environment to be able to isolate the loader behavior.
Rework commit message
Use the correct postfix service to check. With this, the check actually detects crash of
the processed started by the service.
Some progress report on my work of the last days on the subject.
What's next, as a summary, subsequent subtasks should be created later:
This needs rebase on top of D6147
done, and released as swh-vault v1.0.0 :)
- Stop puppet
- Stop gunicorn-swh-vault
- Schema migration [1]
- Clean up the objstorage [2]
- Start back gunicorn-swh-vault
- Upgrade staging workers and webapp nodes with latest swh.vault
- Try a cooking and check result
In D6140#158833, @ardumont wrote:Looks good to me. Could you add a test for this or is it too complicated ? The code is covered but not the introduced behavior,
I'm not entirely sure on how to proceed for the test indeed.
But for sure, it's currently working as expected on the production patched swh-scheduler-journal-client (saatchi).
Looks good to me. Could you add a test for this or is it too complicated ? The code is covered but not the introduced behavior,
These are the steps done to initialized the new cluster [1]:
- add a file datacenter-rackdc.properties on the server with the according DC
gros-50:~$ cat /etc/cassandra/cassandra-rackdc.properties dc=datacenter2 rack=rack1
- change the value of the properties endpoint_snitch from SimpleSnitch to GossipingPropertyFileSnitch [2].
The recommanded value for production is GossipingPropertyFileSnitch so it should have been this since the beginning
- configure the disk_optimization_strategy to ssd on the new datacenter
- update the seed_provider to have one node on each datacenter
- restart the datacenter1 nodes to apply the new configuration
- start the datacenter2 nodes one by one, wait until the status of the node is UN (Up and Normal) before starting another one (They can be stay in the UJ (joining) state for a couple of minutes)
- when done, update the swh keyspace to declare the replication strategy of the second DC
ALTER KEYSPACE swh WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3, 'datacenter2': 3};
The replication of the new changes starts here but the full table contents need to be copied
- rebuild the cluster content:
vsellier@fnancy:~/cassandra$ seq 0 9 | parallel -t ssh gros-5{} nodetool rebuild -ks swh -- datacenter1
The progression can be monitored with nodetool command:
gros-50:~$ nodetool netstats Mode: NORMAL Rebuild e5e64920-0644-11ec-92a6-31a241f39914 /172.16.97.4 Receiving 199 files, 147926499702 bytes total. Already received 125 files (62.81%), 57339885570 bytes total (38.76%) swh/release-4 1082347/1082347 bytes (100%) received from idx:0/172.16.97.4 swh/content_by_blake2s256-2 3729362955/3729362955 bytes (100%) received from idx:0/172.16.97.4 swh/release-3 224510803/224510803 bytes (100%) received from idx:0/172.16.97.4 swh/content_by_blake2s256-1 240283216/240283216 bytes (100%) received from idx:0/172.16.97.4 swh/content_by_blake2s256-4 29491504/29491504 bytes (100%) received from idx:0/172.16.97.4 swh/release-2 6409474/6409474 bytes (100%) received from idx:0/172.16.97.4 ... Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool Name Active Pending Completed Dropped Large messages n/a 0 23 0 Small messages n/a 3 132753939 0 Gossip messages n/a 0 43915 0
or to filter only running transfers:
gros-50:~$ nodetool netstats | grep -v 100% Mode: NORMAL Rebuild e5e64920-0644-11ec-92a6-31a241f39914 /172.16.97.4 Receiving 199 files, 147926499702 bytes total. Already received 125 files (62.81%), 57557961160 bytes total (38.91%) swh/directory_entry-7 4819168032/4925484261 bytes (97%) received from idx:0/172.16.97.4 /172.16.97.2 Receiving 202 files, 111435975646 bytes total. Already received 139 files (68.81%), 60583670773 bytes total (54.37%) swh/directory_entry-12 1631210003/2906113367 bytes (56%) received from idx:0/172.16.97.2 /172.16.97.6 Receiving 236 files, 186694443984 bytes total. Already received 142 files (60.17%), 58869656747 bytes total (31.53%) swh/snapshot_branch-10 4449235102/7845572885 bytes (56%) received from idx:0/172.16.97.6 /172.16.97.5 Receiving 221 files, 143384473640 bytes total. Already received 132 files (59.73%), 58300913015 bytes total (40.66%) swh/directory_entry-4 982247023/3492851311 bytes (28%) received from idx:0/172.16.97.5 Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool Name Active Pending Completed Dropped Large messages n/a 0 23 0 Small messages n/a 2 135087921 0 Gossip messages n/a 0 44176 0
Build is green
Build is green
Build is green
Re-enable provenance storage tests that were disabled by mistake
rebase
This would have caught T3502 earlier too.
Build is green
fix support for old yarn versions (without -s)
rebase
Looks good to me. Could you add a test for this or is it too complicated ? The code is covered but not the introduced behavior,
The second cassandra cluster is finally up and synchronizing with the first one. The rebuild should be done by the end of the day or tomorrow.