After writing 1TB in 40 DB (40 * 25GB), the WAL is ~200GB i.e. ~20%:
$ ansible-playbook -i inventory tests-run.yml && ssh -t $runner direnv exec bench python bench/bench.py --file-count-ro 500 --rw-workers 40 --ro-workers 40 --file-size 50000 --no-warmup
Mon, May 3
While this is very creative, there is no benefit in storing small objects in git for the Software Heritage workload.
There is no need to use Ceph for the Write Storage: PostgreSQL performs well and there is no scaling problem. The size of the Write Storage is limited, by design.
It was discussed, during the Ceph Developer Summit 2021 and the conclusion was that RADOS is not the place to implement immutable optimizations. RGW is a better fit.
- Group the two postgresql nvme drives in a single logical volume to get more storage. We need 30 write workers using 100GB Shards require 3TB of postgresql storage
- Setup a second postgresql server set as a standby replication of the master: it may negatively impact the performances of the master cluster and should be included in the benchmark
$ bench.py --file-count-ro 200 --rw-workers 20 --ro-workers 80 --file-size 50000 --no-warmup ... WARNING:root:Objects write 5.8K/s WARNING:root:Bytes write 117.9MB/s WARNING:root:Objects read 1.3K/s WARNING:root:Bytes read 100.4MB/s
Sun, May 2
$ bench.py --file-count-ro 200 --rw-workers 20 --ro-workers 80 --file-size 50000 --rand-ratio 10 ... WARNING:root:Objects write 5.8K/s WARNING:root:Bytes write 118.4MB/s WARNING:root:Objects read 12.3K/s WARNING:root:Bytes read 850.3MB/s
Sat, May 1
Fix a race condition that failed postgresql database drops.
Tue, Apr 27
The rewrite to use processes was trivial and preliminary tests yield the expected results. Most of the time was spent on two problems:
Tue, Apr 20
Struggled most of the today because there is a bottleneck when using threads and postgress, from a single client. However, when running 4 process, it performs as expected. The benchmark should be rewritten to use the process pool instead of the thread pool which should not be too complicated. I tried to add a warmup phase so that all concurrent threads/process do not start at the same time, but it does not really make any visible difference.
Mon, Apr 19
Completed the tests for the rewrite, it is working. Time to run it in grid5000
Some partitions have reached the tail of the journal and everything is still running smoothly, yay.
Sun, Apr 18
rbd bench on the images created
Sat, Apr 17
There is a 3% space overhead on the RBD data pool. 6TB data, 3TB parity = 9TB. Actual 9.3TB, i.e. ~+3%.
https://www.grid5000.fr/w/Grenoble:Network shows the network topology
Complete rewrite to:
Fri, Apr 16
Wed, Apr 14
- Add reader to continuously read from images to simulate a read workload
- Randomize the payload instead of using easily compressible data (postgres does a good job compressing them and this does not reflect the reality)
Mon, Apr 12
The process has been restarted and is well ongoing (we have 800 million objects left to copy, at around 500 ops, so the ETA until reaching the tail of the log is around 3 weeks now).
- bench.py --file-count-ro 20 --rw-workers 20 --packer-workers 20 --file-size 1024 --fake-ro yields WARNING:root:Objects write 17.7K/s
- bench.py --file-count-ro 40 --rw-workers 40 --packer-workers 20 --file-size 1024 --fake-ro yields WARNING:root:Objects write 13.8K/s
Apr 7 2021
The benchmark was moved to a temporary repository for convenience (easier than uploading here every time). https://git.easter-eggs.org/biceps/biceps
Apr 6 2021
Takeaways from the session:
@KShivendu The linked script is a start. As it is, it requires direct access to the DB; so you need to create abstractions for it in swh-storage and swh-web
Apr 5 2021
Hi guys. Any pointers on where to start?