Page MenuHomeSoftware Heritage

Benchmarking an RBD based object container
Closed, InvalidPublic

Description

The object storage is a collection of RBD images containing a sequence of objects (SWHID + content). To allow for fast random reads, a rocksdb index is updated when an object is added, using the signature from SWHID as a key, the identifier of the image that contains it and the offset at which it is to be found.

Benchmark based on these assumptions:

  • RBD images are ~1TB big
  • The content of the image is as described in T3048

The expected result would be:

  • A 1TB RBD image can be populated at a rate > 100MB/s by a single writer that:
    • Appends to the RBD image
    • Updates a global index containging 5 billions entries
  • Sequential read of the 1TB RBD image is at least > 100MB/s
  • Random reads of the 1TB RBD image is at least > 100MB/s
  • The RBD space overhead is less than 20% (see T3014#57836), i.e. storing 100 bytes does not require more than 120 bytes

The methodology would be similar to what David did for S3 T2706, i.e.:

  • Write a script to
    • Pre-populate a global index with 5 billion entries
    • Setup a Ceph cluster with 20TB of storage and 4+2 RBD pool
  • Create a workload to populate
    • 10 1TB images
  • Interpret results

Event Timeline

dachary changed the task status from Open to Work in Progress.Feb 15 2021, 6:16 PM
dachary triaged this task as Normal priority.
dachary created this task.
dachary created this object in space S1 Public.

The object storage is a collection of RBD images containing a sequence of objects (SHA256 + SWHID + content).

Why do you want to have both a SHA256 and a SWHID (version 1, hence a Git-style SHA1) here?

  • If it is that you don't want to trust SHA1 for collisions, then you can drop the SWHIDv1 all together, as it will be the index knowing where a SWHID is to be found.
  • If on the other hand you're OK trusting SHA1s, then you can drop the SHA256.

What am I missing?

As an element of context, we will be migrating to SHA256 for SWHID version 2, which is meant to happen in 1-2 years. For something as big as an object storage solution migration we should think about that upcoming transition too.

Thanks for the comment. Let's keep just the SWHID then.

vlorentz renamed this task from Bencharmking an RBD based object container to Benchmarking an RBD based object container.Feb 16 2021, 9:42 AM

The T3054 design evolved and this benchmark won't be needed