- User Since
- Jan 8 2021, 11:21 PM (13 w, 16 h)
Wed, Apr 7
The benchmark was moved to a temporary repository for convenience (easier than uploading here every time). https://git.easter-eggs.org/biceps/biceps
Tue, Apr 6
Takeaways from the session:
Tue, Mar 30
Fri, Mar 26
Thu, Mar 25
Wed, Mar 24
Refactored the custer provsioning to use all available disks instead of the existing file system (using cephadm instead of a hand made ceph cluster).
Tue, Mar 23
The benchmark runs and it's not too complicated which is a relief. I'll cleanup the mess I made and move forward to finish writing the software.
The benchmarks are not fully functional but they produce a write load that matches the object storage design. They run (README.txt) via libvirt and are being tested on Grid5000 to ensure all the pieces are in place (i.e. does it actually work to reserve machines + provision them + run) before moving forward.
Wed, Mar 17
Mail thread with Chris Lu on SeaweedFS use cases with 100+ billions objects.
First draft for layer 0.
Mon, Mar 15
Mar 10 2021
With a little help from the mattermost channel and after approval of the account, it was possible to boot a physical machine with a Debian GNU/Linux installed from scratch and get root access to it.
Thanks for helping with the labelling @rdicosmo 👍
Added a section about TCO in the design document.
Mar 9 2021
There is a mattermost channel dedicated to Grid5000 but one has to be invited to join, it is not open to the public.
Additional nvme drives for yeti should be something similar to https://www.samsung.com/semiconductor/ssd/enterprise-ssd/ but confirmation is needed to verify the machines actually have the required SFF-8639 to plug them in.
The account request was approved, I'll proceed with a minimal reservation to figure out how it is done.
Thanks for the feedback. https://www.grid5000.fr/w/Grenoble:Hardware#yeti has 1.6TB nvme which seems better. It would be better to have a total of 4TB nvme available to get closer to the target global index size (i.e. 40 bytes 100 billions entries = 4TB). I'm told it is possible to donate hardware to Grid5000: if testing with the current configuration is not convincing enough, 4 more nvme pcie drives could be donated and they would be installed in the machines. No idea how much delay to expect but its good to know it is possible.
Looking at the available hardware, here is what could be used:
Followed the instructions at https://www.grid5000.fr/w/Grid5000:Get_an_account to get an account. Waiting for approval.
Mar 8 2021
https://en.wikipedia.org/wiki/Perfect_hash_function could be used to get O(1) instead of O(log(N))
Very interesting to see how this problem was presented & solved in the Hadoop ecosystem, thanks for the links.
Mar 5 2021
Mar 4 2021
Mar 3 2021
Mar 2 2021
Mar 1 2021
Feb 24 2021
For the record the half baked benchmark script for the proposed designed I worked on today. To be continued!
Feb 23 2021
Reopening for benchmarking purposes because there does not seem to be anything ready to use T3068.
jumpDB is 100% python therefore less than ideal for CPU performance but for the purpose of benchmarking I/O and space usage it is conveniently ready to use.