Page MenuHomeSoftware Heritage

Winery backend: basic implementation
Started, Work in Progress, NormalPublic

Description

This implementation contains all the essential components with the desired architecture (Read Storage, Write Storage and global index). It is intended to be merged and functional. The next stages will be optimization (as shown to be required by benchmarking, including I/O throttling) and testing of border cases and race conditions.

Test plan

Part of the tests require a live Ceph cluster to run otherwise they will be skipped. Follow the instructions in winery-test-environment/README.md to setup this cluster and run the tests. It should produce the following coverage:

.tox/py3/lib/python3.9/site-packages/swh/objstorage/backends/winery/__init__.py                     1      0      0      0   100%
.tox/py3/lib/python3.9/site-packages/swh/objstorage/backends/winery/database.py                    37      5      8      1    87%   63->72, 67-71
.tox/py3/lib/python3.9/site-packages/swh/objstorage/backends/winery/objstorage.py                  92      4     22      2    95%   67, 82, 106, 112
.tox/py3/lib/python3.9/site-packages/swh/objstorage/backends/winery/roshard.py                     78      5     10      1    93%   28, 37, 65-67
.tox/py3/lib/python3.9/site-packages/swh/objstorage/backends/winery/rwshard.py                     58      1     10      1    97%   74
.tox/py3/lib/python3.9/site-packages/swh/objstorage/backends/winery/sharedbase.py                 100      2     16      0    98%   88-89

Event Timeline

dachary triaged this task as Normal priority.Aug 29 2021, 2:41 PM
dachary created this task.
dachary created this object in space S1 Public.
dachary added a parent task: T3432: Add winery backend.

D6796 is a draft implementation that is not ready for review but can be reviewed anyways in case someone is curious.

  • It only contains the Write Storage side of the logic
  • The logic is assumed to run either the objstorage or the storage workers
  • When a Write Shard is full, it is packed in a separate process running in the background
  • A given objstorage/storage worker will always write into the same Write Shard and will only switch to a new Shard when it is full and scheduled for packing
  • The objstorage/storage worker are assumed to run in an environment that allows them to have write access to the Postgres/Ceph clusters, in the same way they have write permissions to the current Postgres/filesystem
dachary renamed this task from Winery backend: implementation to Winery backend: basic implementation.Dec 11 2021, 10:20 PM
dachary changed the task status from Open to Work in Progress.
dachary updated the task description. (Show Details)

I think D6796 is ready for review and should be merged. It is not final and won't run in production but it is a working implementation of the object storage. From there I intend to setup a benchmarking environment on grid5000. Optimizations will be implemented to improve the benchmark results where it matters the most. The benchmarks that were run during the summer already provided insights which motivated the implementation of the shard packer in C.

  • This commit is already much bigger than I intended. I expect the following patches to be smaller and easier to review.
  • Knowing that this code will run within the objstorage and objectstorage workers, I'm particularly interested in criticism regarding blockers in this context
  • It is not documented nor typed because it is possible that early reviews will require some amount of rearchitecture that would require to throw them away. However I intend to add documentation and types before this commit is merged.

Thanks in advance for your precious time reviewing this rather large commit.

@douardda will you have time for reviewing this week? I stand ready to address your comments. I'll stop working on that for now so that it is not a moving target.

@olasd @vlorentz @vsellier @ardumont I'd be happy to start polishing the implementation after your first wave of review (thanks a lot for the time you spent on this, it is much appreciated 👍 ). I'm however concerned that it might make your work difficult because of the magnitude of the patch. An alternative would be to merge this first implementation, as long as you feel confident it won't break anything when deployed. It should not because it won't be instantiated. That would allow me to propose patches that are smaller and easier to review for:

  • documentation
  • commenting the code
  • testing border cases
  • type annotations

What do you think?

I updated D6796 to add a high level description of the files, classes and logic. I also split the WineryObjStorage class into four classes in an attempt to clarify the separation between the read and the write logic. I will now refrain from doing anything else: it can be difficult on the reviewers to follow a moving target :-D

Happy new year ! Or, in French, "Bonne année deux mille vingt d'oeufs" (pun on words courtesy Easter-Eggs).

(semi-subtle way to ping, valid on once a year 😊 )

Nevermind. I just realized that https://forge.softwareheritage.org/D6796 was in the "Accepted" state and has been for almost two weeks. :facepalm: