Change Details

The object storage is a collection of RBD images containing a sequence of objects (SHA256 + SWHID + content). To allow for fast random access, a rocks index is updated when an object is added, using its SHA256 as a key, the identifier of the image that contains it and the offset at which it is to be found. Benchmark based on these assumptions: * RBD images are ~1TB big * The content of the image is as described in https://forge.softwareheritage.org/T3048 The expected result would be: * A 1TB RBD image can be populated at a rate > 100MB/s by a single writer that: * Appends to the RBD image * Updates a global index containging 5 billions entries * Sequential read of the 1TB RBD image is at least > 100MB/s * Random reads of the 1TB RBD image is at least > 100MB/s * The RBD space overhead is less than 20% (see https://forge.softwareheritage.org/T3014#57836), i.e. storing 100 bytes does not require more than 120 bytes The methodology would be similar to what David did for S3 https://forge.softwareheritage.org/T2706, i.e.: * Write a script to * Pre-populate a global index with 5 billion entries * Setup a Ceph cluster with 20TB of storage and 4+2 RBD pool * Create a workload to populate * 10 1TB images * Interpret results