Query: All Stories

Document swh-counters' default port

	Include stories about projects I am a member of.

I agree, it looks great!

Note that the cache invalidation is not completely done though as the objstorage used
is an azure one.

Currently investigating how to clean that up.

interesting:

Depending on the data characteristics of the table, compressing its data can result in:
25-33% reduction in data size
25-35% performance improvement on reads
5-10% performance improvement on writes

The replaying is currently stopped as the data disks are now almost full.
I will try to activate the compression on some big tables to see if it can help.
I will probably need to start on small tables to recover some space before being able to compress the biggest tables

Note that the cache invalidation is not completely done though as the objstorage used
is an azure one.

status.io: Open maintenance ticket to notify of the partial disruption in service
vangogh: Stop puppet
vangogh: Stop gunicorn-swh-vault
vault db: Schema migration [1]
Upgrade workers and webapp nodes with latest swh.vault and restart cooker service
Start back gunicorn-swh-vault
Try a cooking and check result -> ok
Close maintenance ticket as everything is fine

The patch was test in a loader and in the replayers.
The difference was not really significant on the loader but I'm not really confident in the tests as the cluster had a pretty high load (running replayers + second datacenter synchronization).
I will retry with a more quieter environment to be able to isolate the loader behavior.

Rework commit message

Use the correct postfix service to check. With this, the check actually detects crash of
the processed started by the service.

Some progress report on my work of the last days on the subject.

What's next, as a summary, subsequent subtasks should be created later:

This needs rebase on top of D6147

journal_client: Only upsert if we have something to upsert

developer-setup: Add elasticsearch install instructions

done, and released as swh-vault v1.0.0 :)

Stop puppet
Stop gunicorn-swh-vault
Schema migration [1]
Clean up the objstorage [2]
Start back gunicorn-swh-vault
Upgrade staging workers and webapp nodes with latest swh.vault
Try a cooking and check result

tests: Fix compatibility with dulwich 0.19.11.

Re-add pytest.mark.graph

In D6140#158833, @ardumont wrote:

Looks good to me. Could you add a test for this or is it too complicated ? The code is covered but not the introduced behavior,

I'm not entirely sure on how to proceed for the test indeed.

But for sure, it's currently working as expected on the production patched swh-scheduler-journal-client (saatchi).

Looks good to me. Could you add a test for this or is it too complicated ? The code is covered but not the introduced behavior,

grid5000/cassandra Adapt the script to support a multidc deployment

These are the steps done to initialized the new cluster [1]:

add a file datacenter-rackdc.properties on the server with the according DC

gros-50:~$ cat /etc/cassandra/cassandra-rackdc.properties 
dc=datacenter2
rack=rack1

change the value of the properties endpoint_snitch from SimpleSnitch to GossipingPropertyFileSnitch [2].

The recommanded value for production is GossipingPropertyFileSnitch so it should have been this since the beginning

configure the disk_optimization_strategy to ssd on the new datacenter
update the seed_provider to have one node on each datacenter
restart the datacenter1 nodes to apply the new configuration
start the datacenter2 nodes one by one, wait until the status of the node is UN (Up and Normal) before starting another one (They can be stay in the UJ (joining) state for a couple of minutes)
when done, update the swh keyspace to declare the replication strategy of the second DC

ALTER KEYSPACE swh WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3, 'datacenter2': 3};

The replication of the new changes starts here but the full table contents need to be copied

rebuild the cluster content:

vsellier@fnancy:~/cassandra$ seq 0 9 | parallel -t ssh gros-5{} nodetool rebuild -ks swh -- datacenter1

The progression can be monitored with nodetool command:

gros-50:~$ nodetool netstats                                                                 
Mode: NORMAL                                                                                           
Rebuild e5e64920-0644-11ec-92a6-31a241f39914                                                            
    /172.16.97.4                                                                                                                                      
        Receiving 199 files, 147926499702 bytes total. Already received 125 files (62.81%), 57339885570 bytes total (38.76%)
            swh/release-4 1082347/1082347 bytes (100%) received from idx:0/172.16.97.4                                                                           
            swh/content_by_blake2s256-2 3729362955/3729362955 bytes (100%) received from idx:0/172.16.97.4
            swh/release-3 224510803/224510803 bytes (100%) received from idx:0/172.16.97.4                
            swh/content_by_blake2s256-1 240283216/240283216 bytes (100%) received from idx:0/172.16.97.4
            swh/content_by_blake2s256-4 29491504/29491504 bytes (100%) received from idx:0/172.16.97.4
            swh/release-2 6409474/6409474 bytes (100%) received from idx:0/172.16.97.4                
...
Read Repair Statistics:                                                                                     
Attempted: 0                                                                                          
Mismatch (Blocking): 0                                                                                
Mismatch (Background): 0                                                                            
Pool Name                    Active   Pending      Completed   Dropped                                
Large messages                  n/a         0             23         0                                
Small messages                  n/a         3      132753939         0                          
Gossip messages                 n/a         0          43915         0

or to filter only running transfers:

gros-50:~$ nodetool netstats  | grep -v 100%
Mode: NORMAL
Rebuild e5e64920-0644-11ec-92a6-31a241f39914
    /172.16.97.4
        Receiving 199 files, 147926499702 bytes total. Already received 125 files (62.81%), 57557961160 bytes total (38.91%)
            swh/directory_entry-7 4819168032/4925484261 bytes (97%) received from idx:0/172.16.97.4
    /172.16.97.2
        Receiving 202 files, 111435975646 bytes total. Already received 139 files (68.81%), 60583670773 bytes total (54.37%)
            swh/directory_entry-12 1631210003/2906113367 bytes (56%) received from idx:0/172.16.97.2
    /172.16.97.6
        Receiving 236 files, 186694443984 bytes total. Already received 142 files (60.17%), 58869656747 bytes total (31.53%)
            swh/snapshot_branch-10 4449235102/7845572885 bytes (56%) received from idx:0/172.16.97.6
    /172.16.97.5
        Receiving 221 files, 143384473640 bytes total. Already received 132 files (59.73%), 58300913015 bytes total (40.66%)
            swh/directory_entry-4 982247023/3492851311 bytes (28%) received from idx:0/172.16.97.5
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name                    Active   Pending      Completed   Dropped
Large messages                  n/a         0             23         0
Small messages                  n/a         2      135087921         0
Gossip messages                 n/a         0          44176         0

Remove .wasm from all distributions, and .so from source distributions

Re-enable provenance storage tests that were disabled by mistake

This would have caught T3502 earlier too.

Bump minimum swh-vault version

fix support for old yarn versions (without -s)

Looks good to me. Could you add a test for this or is it too complicated ? The code is covered but not the introduced behavior,

The second cassandra cluster is finally up and synchronizing with the first one. The rebuild should be done by the end of the day or tomorrow.

All Stories
Use Results
Edit Query
Hide Query

Aug 27 2021

Aug 26 2021

All StoriesUse ResultsEdit QueryHide Query

Aug 27 2021

Aug 26 2021

All Stories
Use Results
Edit Query
Hide Query