Ambry has been a great source of inspiration and the best fit for the software heritage use case. Including the partition UUID in the object takes advantage of the immutability of the objects allows all readers to have a scale out object storage.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Feb 22 2021
It turns out there are a number of suitable formats (SST from RocksDB for one), no need to re-invent this wheel.
In the T3054 proposed design, objects are packed into larger files and there is no reason to continue in this direction. There seems to be a consensus that tenths of billions of individual objects is problematic. It takes very long to enumerate, for one thing. And noone is doing that which is not a great sign.
The T3054 design evolved and this benchmark won't be needed
Feb 21 2021
Readonly partitions are stored in Sorted String Table format.
Open sourcing DataHub: LinkedIn’s metadata search and discovery platform explains how developers work on DataHub and the relationship between code internal to Linkedin and what is published as Free Software. It is not about ambry and maybe the ambry team has a completely different behavior. A similar article about ambry is dated 2016:
Feb 20 2021
QuarkDB is now used for namespace. It stores 2.5 billions objects.
Feb 17 2021
Let's leave it open: although T3050 is a better fit, it is not ready yet and an interim solution may be required.
T3050 is a better fit as it does not require any specification or development.
Although it is not a good fit to store all objects, it is a better fit than RBD + a custom format to store 1TB worth of objects. Provided support for multiple concurrent readers is added.
In the following small objects are < 4KB and object storage software refers to the list of software from the description for which there are no blockers.
We'd want a reader to try reading on the mirrored pool, and then to fall back to the erasure coded pool if the object is larger than the cutoff. The increased latency in getting large objects may be worth the space savings ? I don't know.
In T3052#58878, @dachary wrote:If the size of the object was known to the reader of the object store it would be a great way to develop storage strategies depending on the object size. So far I assumed the reader does not have that information and is therefore unable to figure out which object storage to use based on that information but maybe I missed something?
The bench script and full results are in the tarbal.
In T3054#58874, @olasd wrote:@zack, very good point about having a target for the "time to first byte when reading an object".
I don't know what would be a "good" target for that metric; my gut says that staying within 100ms for any given object would be acceptable, as long as the number of parallel readers doesn't impact the amount too much (of course, within the IOPS of the underlying media, etc.).
If the size of the object was known to the reader of the object store it would be a great way to develop storage strategies depending on the object size. So far I assumed the reader does not have that information and is therefore unable to figure out which object storage to use based on that information but maybe I missed something?
Feb 16 2021
Maybe it would make sense to consider putting the very small objects (e.g. those <= the min alloc size) into a 3 or 4-way mirrored pool instead of an erasure coded pool;
@zack, very good point about having a target for the "time to first byte when reading an object".