A python module implemented in C that ingests key/value pairs, writes them on file and creates a hash table to be used as an index, in the same file. The file is a Shard in the Read Storage.
Format
- Format version
- An index which is a hash table
- The content of the objects
Let's assume the shard stores the following objects:
- foo (len=3, sha256=2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae)
- bar (len=3, sha256=fcde2b2edba56bf408601fb721fe9b5c338d10ee429ea04fae5511b68fbf8fb9)
- baz (len=3, sha256=baa5a0964d3320fbc0c6a922140453c8513ea24ab8fd0577034804a967248096)
- quux (len=4, sha256=053057fda9a935f2d4fa8c7bc62a411a26926e00b491c07c1b2ec1909078a0a2)
- (the empty object, len=0, sha256=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855)
The content of the file would look like this:
header | 00 | +----+ 0000 | 01 | +----+ index | sha256 | offset | size | | 00 01 .. 30 31 | 32 .. 38 39 | 40 .. 46 47 | +----+----+----+----+----+----+----+----+----+----+----+----+----+ 0001 | 2c | 26 | .. | e7 | ae | 00 | .. | 00 | f1 | 00 | .. | 00 | 03 | 0031 | fc | de | .. | 8f | b9 | 00 | .. | 00 | f4 | 00 | .. | 00 | 03 | 0061 | ba | a5 | .. | 09 | 96 | 00 | .. | 00 | f7 | 00 | .. | 00 | 03 | 0091 | 05 | 30 | .. | a0 | a2 | 00 | .. | 00 | fa | 00 | .. | 00 | 04 | 00c1 | e3 | b0 | .. | b8 | 55 | 00 | .. | 00 | 00 | 00 | .. | 00 | 00 | +----+----+----+----+----+----+----+----+----+----+----+----+----+ payload +----+----+----+ 00f1 | 66 | 6f | 6f | 00f4 | 62 | 61 | 72 | 00f7 | 62 | 61 | 7a | 00fa | 71 | 75 | 75 | 78 | +----+----+----+----+
Writing
It is assumed writing is done in batch, sequentially
Reading
- HASH(SHA256) in the index
- Seek to the object content to stream it to the caller in chunks of a given size