Page MenuHomeSoftware Heritage

Using libcephsqlite to store objects
Closed, MigratedEdits Locked

Description

https://github.com/ceph/ceph/pull/39191/files

I don't know enough about the internals of SQLite to figure out if that could work. Having a single writer with EXCLUSIVE + WAL to minimize latency is not too much of a concern (the target is writing at least 100MB/s). But having hundreds of concurrent readers is a requirement: there are many daemons searching the content of the archive. The "yet" in the documentation[0] makes me think it is a next step and hopefully not too difficult.

It is conceptually similar to a collection of RBD images containing a sequence of objects (SHA256 + content). To allow for fast random reads, a rocksdb index located elsewhere (not necessarily in Ceph since it can be rebuilt if lost) is updated when an object is added, using SHA256 as a key, the identifier of the image that contains it and the offset at which it is to be found.

[0] https://github.com/ceph/ceph/pull/39191/files#diff-22aa44fff401f8d885a88967e51b0cea71ee87bdfc956ff3bd878056660a9d5eR197-R202

Event Timeline

dachary changed the task status from Open to Work in Progress.Feb 15 2021, 9:02 PM
dachary triaged this task as Normal priority.
dachary created this task.
dachary created this object in space S1 Public.

A mail was sent to Patrick Donnelly to ask for his opinion on the matter.

There is a hard limit on the sqlite database (~280TB) so it would not work, even if perfectly optimized.

https://www.sqlite.org/limits.html

Maximum Database Size

Every database consists of one or more "pages". Within a single database, every page is the same size, but different database can have page sizes that are powers of two between 512 and 65536, inclusive. The maximum size of a database file is 4294967294 pages. At the maximum page size of 65536 bytes, this translates into a maximum database size of approximately 1.4e+14 bytes (281 terabytes, or 256 tebibytes, or 281474 gigabytes or 256,000 gibibytes).

This particular upper bound is untested since the developers do not have access to hardware capable of reaching this limit. However, tests do verify that SQLite behaves correctly and sanely when a database reaches the maximum file size of the underlying filesystem (which is usually much less than the maximum theoretical database size) and when a database is unable to grow due to disk space exhaustion.

dachary reopened this task as Work in Progress.EditedFeb 17 2021, 11:18 PM

Although it is not a good fit to store all objects, it is a better fit than RBD + a custom format to store 1TB worth of objects. Provided support for multiple concurrent readers is added.

dachary changed the task status from Work in Progress to Open.Feb 22 2021, 12:25 AM

There is no need to use Ceph for the Write Storage: PostgreSQL performs well and there is no scaling problem. The size of the Write Storage is limited, by design.