I close this issue because after the @vlorentz 's analysis it seems there isn't a lot of things to improve
Wed, Sep 14
Tue, Sep 13
These are the results of the different algorithms tests for the directory_add (with 20 directory replayers)
Mon, Sep 12
Fri, Sep 9
here some profiling of a couple of replayers:
swh@storage-replayer-origin-visit-76f6bf9d75-znqfs:~$ time python -m cProfile -o /tmp/origin-visit.pyprof /opt/swh/.local/bin/swh storage replay --stop-after-objects 10000 WARNING:cassandra.cluster:Downgrading core protocol version from 66 to 65 for 192.168.100.181:9042. To avoid this, it is best practice to explicitly set Cluster(protocol_version) to the version supported by your cluster. http://datastax.github.io/python-driver/api/cassandra/cluster.html#cassandra.cluster.Cluster.protocol_version WARNING:cassandra.cluster:Downgrading core protocol version from 65 to 5 for 192.168.100.181:9042. To avoid this, it is best practice to explicitly set Cluster(protocol_version) to the version supported by your cluster. http://datastax.github.io/python-driver/api/cassandra/cluster.html#cassandra.cluster.Cluster.protocol_version INFO:cassandra.policies:Using datacenter 'sesi_rocquencourt' for DCAwareRoundRobinPolicy (via host '192.168.100.181:9042'); if incorrect, please specify a local_dc to the constructor, or limit contact points to local cluster nodes Done.
Thu, Sep 8
Aug 25 2022
Aug 18 2022
all server reconfigured and cassandra started on them:
/opt/cassandra/bin/nodetool status Datacenter: sesi_rocquencourt ============================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 192.168.100.184 88.65 KiB 16 34.3% e0c24d24-6f68-4a26-8561-94e67b58211a rack1 UN 192.168.100.181 84.71 KiB 16 31.3% 1d9b9e7d-b376-4afe-8f67-482e8412f21b rack1 UN 192.168.100.186 69.07 KiB 16 34.2% 0dd3426d-9159-47bd-9b4e-065ff0fbb889 rack1 UN 192.168.100.183 69.08 KiB 16 37.1% 78281a92-7fa0-43bd-bc33-c5b419ee8715 rack1 UN 192.168.100.185 69.07 KiB 16 32.2% abf9b69e-3cec-4ac3-a195-a54481e4d9da rack1 UN 192.168.100.182 74.05 KiB 16 30.9% eca5ea5d-8bd5-4301-9a5e-ffa01aa1b7e5 rack1
Recreating the zpool correctly:
# mixedused ls /dev/disk/by-id/nvme-MO003200KXAVU* | grep -v part | xargs -t zpool create -o ashift=12 -O mountpoint=none mixeduse zfs create -o mountpoint=/srv/cassandra/instance1/data mixeduse/cassandra-instance1-data
Testing the performances of the different configuration (on a zfs pool with only one disk):
- disk block: 512k / zpool ashift:9
zpool create -o ashift=9 -O mountpoint=none mixeduse /dev/disk/by-id/nvme-MO003200KXAVU_SJA4N7938I0405A0U zfs create -o mountpoint=/srv/cassandra/instance1/data -o atime=off -o relatime=on mixeduse/cassandra-data cd /srv/cassandra/instance1/data bonnie++ -d . -m cassandra04 -u nobody Using uid:65534, gid:65534. Writing a byte at a time...done Writing intelligently...done Rewriting...done Reading a byte at a time...done Reading intelligently...done start 'em...done...done...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...done. Create files in random order...done. Stat files in random order...done. Delete files in random order...done. Version 2.00 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP cassandra04 515496M 293k 99 1.0g 99 703m 99 661k 99 1.4g 91 13717 463 Latency 48216us 7316us 8224us 23303us 7928us 1606us Version 2.00 ------Sequential Create------ --------Random Create-------- cassandra04 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 16384 98 +++++ +++ 16384 8 +++++ +++ +++++ +++ 16384 99 Latency 2679us 1207us 4851ms 2850us 138us 301us 1.98,2.00,cassandra04,1,1659338044,515496M,,8192,5,293,99,1080974,99,720299,99,661,99,1488832,91,13717,463,16,,,,,28232,98,+++++,+++,2018,8,+++++,+++,+++++,+++,24821,99,48216us,7316us,8224us,23303us,7928us,1606us,2679us,1207us,4851ms,2850us,138us,301us
The nvme format command didn't succeed on the write intensive disk. It never exits and the disk become unresponsive after that.
Aug 17 2022
Aug 12 2022
The puppet code is ready for review. It was updated to support multi instances deployment in anticipation of T4375.
Aug 10 2022
For the record, the issues related to the commitlog_directory configuration:
After spending some time to successfully start a cassandra cluster of 2 nodes with a declarative configuration), these are the observations:
- A service can't be used to expose the cassandra ports to the clustrer, the pod address must be used. It's because cassandra use the dns name provided as listen address
- It should work by setting the listen address to 0.0.0.0 but it's stongly recommanded to not use this in the documentation
Setting listen_address to 0.0.0.0 is always wrong.
- Using internal pod address will avoid multi DC deployment for the future
Aug 9 2022
Aug 8 2022
Aug 5 2022
Aug 4 2022
Deployed both in staging and production :
Jul 13 2022
Unfortunately, the operator test is a failure due to the lack of configuration possibility
- non blocker, the init containers are OOMkilled during the start, it can be solved by editing the cassandra statefulset created by the operator to extend the limits
- blocker, it's not possible to configure the commitlog_directory explicitly. it's by default on /var/lib/cassandra/commitlog
- it's not easy to propagate the host mounts to use 2 mountpoints /srv/cassandra and /srv/cassandra/commitlog without tweaking the kernel / rancher configuration
- it's not possible to add a second volume on the pod description created by the operator
Jul 12 2022
Jul 11 2022
Finally, the cluster is up.
I'm not sure what unstuck the node registration, but I suspect a node with all the roles is needed to bootstrap the cluster.
I tried this initially, it didn't worked, but I'm not sure in which status the cluster was.
Jul 7 2022
The management nodes were correctly created but it seems rancher is having some issuer to register them in the cluster.
Jul 5 2022
Jul 4 2022
Jul 1 2022
Jun 23 2022
the Git loader now exports a swh_loader_filtered_objects_total metric. We should generalize this to other loaders eventually; using one of the options above
Jun 10 2022
Jun 2 2022
May 31 2022
May 4 2022
Apr 19 2022
Feature has been implemented and deployed, closing this.
Apr 12 2022
New fix needed so another round for v1.3.1
Apr 11 2022
Mar 30 2022
Thanks olasd for restarting the service following this documentation: https://docs.gunicorn.org/en/stable/signals.html#upgrading-to-a-new-binary-on-the-fly
First, replace the old binary with a new one, then send a USR2 signal to the current master process. It executes a new binary whose PID file is postfixed with .2 (e.g. /var/run/gunicorn.pid.2), which in turn starts a new master process and new worker processes:
At this point, two instances of Gunicorn are running, handling the incoming requests together. To phase the old instance out, you have to send a WINCH signal to the old master process, and its worker processes will start to gracefully shut down.