- User Since
- Jan 8 2021, 11:21 PM (36 w, 2 d)
Mon, Aug 30
Sun, Aug 29
Mon, Aug 23
Aug 12 2021
Throttling writes to 120MBs to reduce the pressure:
The number of slow random reads reaches ~3.5% presumably because there is too much write pressure (the throttling of writes was removed).
The benchmarks were modified to (i) use a fixed number of random / sequential readers instead of a random choice for better predictability, (ii) introduce throttling to cap the sequential reads speed to approximately 200MB/s. A run of read only was run:
The run terminated August 11th @ 15:21 because of what appears to be a rare race condition. It was however mostly finished. The results show an unexpected degradation in the read performances. It deserves further investigation because it keeps degrading over time. The write performance are however stable and suggest the benchmark code itself may be responsible for this degradation. If the Ceph cluster was globally slowing down, both reads and writes would show a degradation in performance because previous benchmark results showed that there is a correlation between the two.
Aug 2 2021
Improve the readability of the graphs
Rehearse the run and make minor updates to make sure it runs right away this friday.
Jul 20 2021
In the global read index, I would consider storing, for each object, alongside the shard id, the length and offset of the object (which are comparatively cheap to store)
Jul 19 2021
A 100GB file can have 25M objects (4KB median size). If a perfect hash function requires 4bits per entry, that's reading ~12MB for every lookup.
the colliding entries may be stored adjacent to each other...
I just realized that since a perfect hash function need parameters that may require additional sequential reads at the beginning of the file, it would actually make more sense to have a regular hash function with a format that allows for collisions. Even if the collisions are relatively frequent, the colliding entries may be stored adjacent to each other and will not require an additional read. They are likely to be in the same block most of the time. That would save the trouble of implementing a perfect hash function.
what are "the parameters to the perfect hash functions"? what are the possible formats?
The content of a file:
On the topic of throttling, the following discussion happened on IRC:
I misrepresented @olasd suggestions, here is the chat log on the matter.