The object storage is a collection of RBD images containing a sequence of objects (SHA256 + SWHID + content). To allow for fast random access, a rocks index is updated when an object is added, using its SHA256 as a key, the identifier of the image that contains it and the offset at which it is to be found.
Benchmark based on these assumptions:
* RBD images are ~1TB big
* The content of the image is as described in https://forge.softwareheritage.org/T3048
The expected result would be:
* A 1TB RBD image can be populated at a rate > 100MB/s by a single writer that:
* Appends to the RBD image
* Updates a global index containging 5 billions entries
* Sequential read of the 1TB RBD image is at least > 100MB/s
* Random reads of the 1TB RBD image is at least > 100MB/s
* The RBD space overhead is less than 20% (see [2]), i.e. storing 100 bytes does not require more than 120 bytes
The methodology would be similar to what David did for S3 https://forge.softwareheritage.org/T2706, i.e.:
* Write a script to
* Pre-populate a global index with 5 billion entries
* Setup a Ceph cluster with 20TB of storage and 4+2 RBD pool
* Create a workload to populate
* 10 1TB images
* Interpret results