For archive, the following discussions and explorations happened while looking for off-the-shelf solutions that would be a match for the Software Heritage workload.
Explorations
- Scale out data and metadata
- T3064 ambry
- T3052 RADOS space benchmark (requires development to reduce the space overhead and maintain performances)
- ??? RGW
- Object packing
- T3066 RocksDB SST
- ambry partition format (append only)
- T3068 Sorted String Table (read only)
- T3050 libcephsqlite or SQlite on top of RBD (read write)
- T3046 Using xz-file-format for 1TB archive
- T3045 Using pixz for 1TB archives
- T3048 Using a custom format for 1TB archive
- T3069 Using MZ as a file format
- Scale out data and scale up metadata. The metadata is in a database (Rocksdb, etc.) that must be looked up to figure out where the data is to be found, as described in the Finding a needle in Haystack: Facebook’s photo storage.
- T3049 Distributed database + RBD space benchmark (requires development on top of these building blocks)
- Storage systems with blockers
- T3051 EOS is too complex (uses RBD + Paxos + QuarkDB for namespace)
- T3057 Seaweedfs is not yet mature (uses large files to pack objects + Paxos + internal database for metadata)
- https://github.com/open-io replication is a proprietary feature https://docs.openio.io/latest/source/admin-guide/configuration_replicator.html
- https://ipfs.io/ does not provide replication or self-healing. Performances and space overhead are probably the same as the current Software Heritage storage system.
- https://www.rozosystems.com/about claims a software patent on the implementation
- http://www.orangefs.org/ or http://beegfs.io/ have a focus on high-end computing
- https://www.lustre.org/ https://moosefs.com/ are distributed file systems, not object / block storage
- min.io stores each object in an individual file on a file system, a space overhead that is identical to the current Software Heritage storage system.
- Swift stores each object in an individual file on a file system, a space overhead that is identical to the current Software Heritage storage system.
- Inspiration
- T3065 git partial clone (in part because it does packing, in part because it is source code related)
- Hardware
Discussions
- Redis as a K/V store for billions of objects
- Looking for hardware to benchmark the object storage design
- Scale out object storage design (take 1)
- Hardware for object storage
- Storing 20 billions of immutable objects in Ceph, 75% <16KB
- Small RGW objects and RADOS 64KB minimun size
- Using RBD to pack billions of small files
- Benchmarking RBD to store artifacts
- Durable self healing distributed append only storage