Page MenuHomeSoftware Heritage

Using RocksDB SST as a file format
Closed, MigratedEdits Locked

Description

https://github.com/facebook/rocksdb/wiki/A-Tutorial-of-RocksDB-SST-formats#block-based-table

In block-based table, data is chucked into (almost) fix-sized blocks (default block size is 4k). Each block, in turn, keeps a bunch of entries.

When storing data, we can compress and/or encode data efficiently within a block, which often resulted in a much smaller data size compared with the raw data size.

As for the record retrieval, we'll first locate the block where target record may reside, then read the block to memory, and finally search that record within the block. Of course, to avoid frequent reads of the same block, we introduced the block cache to keep the loaded blocks in the memory.

https://github.com/facebook/rocksdb/wiki/Rocksdb-BlockBasedTable-Format

RocksDB provide the user with APIs that can be used to create SST files that can be ingested later. This can be useful if you have a use case that needs to load the data quickly, but the process of creating the data can be done offline.

The python package for RocksDB does not support reading or writing to SST files.

Event Timeline

dachary changed the task status from Open to Work in Progress.Feb 23 2021, 9:30 AM
dachary created this task.
dachary created this object in space S1 Public.
dachary updated the task description. (Show Details)
dachary updated the task description. (Show Details)

There is not enough tooling to use SST files independently of RocksDB. Maybe it is possible to use the RocksDB with a configuration that makes it so it only uses a single SST file ?

vlorentz triaged this task as Normal priority.Feb 26 2021, 6:58 PM

It is not worth the effort and using a hash table is a better option.