Related to: T3533
Details
- Reviewers
vlorentz - Group Reviewers
Reviewers - Maniphest Tasks
- T3533: Winery backend: basic implementation
- Commits
- rDOBJS9a88cafbc72c: winery: first implementation
Part of the tests require a live Ceph cluster to run otherwise they will be skipped. Follow the instructions in winery-test-environment/README.md to setup this cluster and run the tests. It should produce the following coverage:
.tox/py3/lib/python3.9/site-packages/swh/objstorage/backends/winery/__init__.py 1 0 0 0 100% .tox/py3/lib/python3.9/site-packages/swh/objstorage/backends/winery/database.py 37 5 8 1 87% 63->72, 67-71 .tox/py3/lib/python3.9/site-packages/swh/objstorage/backends/winery/objstorage.py 92 4 22 2 95% 67, 82, 106, 112 .tox/py3/lib/python3.9/site-packages/swh/objstorage/backends/winery/roshard.py 78 5 10 1 93% 28, 37, 65-67 .tox/py3/lib/python3.9/site-packages/swh/objstorage/backends/winery/rwshard.py 58 1 10 1 97% 74 .tox/py3/lib/python3.9/site-packages/swh/objstorage/backends/winery/sharedbase.py 100 2 16 0 98% 88-89
Diff Detail
- Repository
- rDOBJS Object storage
- Branch
- master
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 25594 Build 40021: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 40020: arc lint + arc unit
Event Timeline
No, they are to be run right before pytest to provision the ceph cluster (either via libvirt or via fed4fire)
Hmm... alright, let's keep it in this repo. But move it to the root (ie. something like swh-objstorage/winery-benchmarks/ instead of swh-objstorage/swh/objstorage/tests/winery/), as the current location implies it is used from the Python code.
Sounds good?
Build is green
Patch application report for D6796 (id=24766)
Rebasing onto 8f992e8a20...
Current branch diff-target is up to date.
Changes applied before test
commit 546cf88ecf824bfe35b0bbeaec47d4b0751a1880 Author: Loïc Dachary <loic@dachary.org> Date: Fri Dec 10 17:58:49 2021 +0100 winery: basic implementation Relates to: T3533
See https://jenkins.softwareheritage.org/job/DOBJS/job/tests-on-diff/95/ for more details.
Build is green
Patch application report for D6796 (id=24787)
Rebasing onto 8f992e8a20...
Current branch diff-target is up to date.
Changes applied before test
commit 52743ddc48096fdd39578544cc01f8d20cdff80f Author: Loïc Dachary <loic@dachary.org> Date: Fri Dec 10 17:58:49 2021 +0100 winery: basic implementation Relates to: T3533
See https://jenkins.softwareheritage.org/job/DOBJS/job/tests-on-diff/96/ for more details.
Build is green
Patch application report for D6796 (id=24859)
Rebasing onto 4238095a6c...
First, rewinding head to replay your work on top of it... Applying: winery: first implementation
Changes applied before test
commit 13dd82948a6d4093a9ab314cf0c8067b332d0eab Author: Loïc Dachary <loic@dachary.org> Date: Fri Dec 10 17:58:49 2021 +0100 winery: first implementation Relates to: T3533
See https://jenkins.softwareheritage.org/job/DOBJS/job/tests-on-diff/97/ for more details.
Looks like solid ground work, thanks!
On the code, I've left a few inline comments. I would appreciate having a high-level documentation of the architecture of the code, understanding what class does what. For instance, having a high level idea of the lifetime of shards and what bits of the code handle them would be useful, even if for now I think reading the code is fine.
One thing that's not explicit at all, is that the WineryObjStorage class handles write operations on a single shard (and that "multi-shard parallelism" is extrinsic to that class), but it handles read operations on any shard. This could be an operational concern
Generally, we don't capitalize the module-level logger object (compare mr grep LOGGER vs. mr grep logger in swh-environment).
(I haven't gone through reading the tests code yet)
I've tried to run the integration tests (using a local libvirt), and I've had a couple of issues, which I've eventually worked around.
I think the shell scripts need some hardening (set -e and only dropping machines if they're defined), but the most important issue is being able to set the QEMU connection string. For some reason my "qemu:///session" is unable to create bridges, but I have privileges to create bridges in "qemu:///system"; I've had to set -c in virsh calls, and --connection in virt-create.
The scripts depend on being run in the winery-test-environment dir, but I'm not sure that's explicit. maybe just add a cd $(dirname $0) or somesuch.
I suggest installing mitogen and using ANSIBLE_STRATEGY=mitogen_linear to speed up the ansibling of hosts (it keeps persistent ssh connections and a single remote python interpreter to do its work, instead of spawning a script for each task). It seems to work just fine in my attempts.
(this is just a generic comment, I don't think anything in there is actionable at all)
I don't think I agree with that. Ideally we would want to pass most changes through regression integration tests (which would also contain "historical" data to ensure continued backwards compatibility), while benchmarking for performance regressions could "only" happen before releasing.
And such benchmarking would be *very* difficult to fit in the CI. Last but not least, the overhead of manually running the Ceph integration/performance tests is likely to be an order of magnitude less over the next five years than the overhead of making sure they can run from the CI without making so much noise (because of environmental failures) that it often requires manual intervention anyways.
We should assume that anything that must happen but is not automated will, eventually, not happen, in a way that's detrimental.
I expect that we will have a persistent "staging" ceph cluster against which we would be able to run these kinds of workloads (or be able to spawn a scratch cluster on kubernetes, or something).
mypy.ini | ||
---|---|---|
25–27 | Is this really needed? | |
swh/objstorage/backends/winery/bench.py | ||
1 ↗ | (On Diff #24859) | Could you consider s/bench/benchmark/ (at least in the module name)? I do wonder if it should move to the tests submodule too. |
84 ↗ | (On Diff #24859) | This generates only contents that are neatly aligned on 1024 byte multiple boundaries. Maybe it would make sense to add some fuzz/randomness to these object sizes, to exercise some obviously-unaligned reads in the ceph cluster. |
swh/objstorage/backends/winery/database.py | ||
19 | Should be a better pattern to properly override only the dbname out of the provided dsn. It may be worth having a common method/context manager for opening an "admin connection" wrapping all of the common operations you're doing on all these methods (connecting, setting autocommit, getting a cursor, doing stuff, then closing the connection on __exit__). | |
74 | ||
swh/objstorage/backends/winery/objstorage.py | ||
60–61 | I've ended up writing the same comment on the current version of this code, so I think there's a gap in the logic somewhere. Here's the scenario which worries me: A loader sends object A to be written; it's picked up by the objstorage worker for shard1 which inserts (A, shard1, inflight=true) in signature2shard, and swiftly crashes because of a bug. The loader in turn crashes. The write for object A is attempted again by another loader; the objstorage worker for shard2 picks it up, but, AFAICT, fails phase 1 (because of the unique index on signature2shard(signature)). What happens then to object A? | |
swh/objstorage/backends/winery/roshard.py | ||
21–25 | Doesn't look like init has any side effect, so all calls can probably go to the __init__ function | |
62–100 | I would be (much) more comfortable if these functions were only in a Pool subclass defined and used by tests. | |
swh/objstorage/backends/winery/sharedbase.py | ||
29 | What is the meaning of this magic number? | |
31–45 | Probably worth using triple-quoted strings for SQL queries generally, but that's purely stylistic. | |
32–37 | Do we really need an id and a name? Either way the name column should probably be made unique. I would suggest merging both into a single uuid-based column (and reconstructing the database names from the shard uuid on the fly). In any case the char type has some weird padding behavior with spaces, which is probably not what we want (even though it probably doesn't matter in practice). | |
95 | Seeing the way these booleans are handled, I think what @vlorentz had in mind was making the shard "state" a single non-null column containing a ternary enum (with values such as "writable", "packing", "readonly"). | |
128–143 | These return values are inconsistent, I assume we would want to make them all consistent (maybe using a dataclass for ShardInfo?) | |
winery-test-environment/build-vms.sh | ||
1 ↗ | (On Diff #24859) | Missing shebang, not everyone uses bash :P |
100 ↗ | (On Diff #24859) |
On the code, I've left a few inline comments.
I've addressed them (hopefully) thank you.
I would appreciate having a high-level documentation of the architecture of the code, understanding what class does what. For instance, having a high level idea of the lifetime of shards and what bits of the code handle them would be useful, even if for now I think reading the code is fine.
I'll do that in the next few days. I intended to (documentation + types) but postponed that in case the review required a complete change in the architecture.
One thing that's not explicit at all, is that the WineryObjStorage class handles write operations on a single shard (and that "multi-shard parallelism" is extrinsic to that class), but it handles read operations on any shard. This could be an operational concern
The multi-shard parllelism for writes is part of the swh-winery (in swh-environment/docker) and should be considered as part of the winery. It currently is almost identical to swh-obstorage but may evolve independently.
The asymmetry between write and read shards reflects the fact that there will always be a a limited numbers of write shards, hence it won't be an issue to have one worker per write shard. And it is likely to make things simpler in the long run. Workers handling reads must however deal with an ever growing number of shards and that require them to be able to reach any number of them.
Generally, we don't capitalize the module-level logger object (compare mr grep LOGGER vs. mr grep logger in swh-environment).
Oh... I took the seaweedfs implementation as an inspiration but indeed... I'll revert that.
(I haven't gone through reading the tests code yet)
I've tried to run the integration tests (using a local libvirt), and I've had a couple of issues, which I've eventually worked around.
I think the shell scripts need some hardening (set -e
That's done, good idea.
and only dropping machines if they're defined),
I left the brute force because it proved reliable but ignored errors to avoid the disturbing noise.
but the most important issue is being able to set the QEMU connection string. For some reason my "qemu:///session" is unable to create bridges, but I have privileges to create bridges in "qemu:///system"; I've had to set -c in virsh calls, and --connection in virt-create.
I set the default to "qemu:///system" with a variable so that it can be easier to override. It is indeed a common problem.
The scripts depend on being run in the winery-test-environment dir, but I'm not sure that's explicit. maybe just add a cd $(dirname $0) or somesuch.
I did that too, good idea.
I suggest installing mitogen and using ANSIBLE_STRATEGY=mitogen_linear to speed up the ansibling of hosts (it keeps persistent ssh connections and a single remote python interpreter to do its work, instead of spawning a script for each task). It seems to work just fine in my attempts.
I'll leave that to the taste of the person running the script. I have mixed feeling about mitogen and fear that it could create more environmental problems that may end up costing more human time than it saves computing time.
(this is just a generic comment, I don't think anything in there is actionable at all)
I don't think I agree with that. Ideally we would want to pass most changes through regression integration tests (which would also contain "historical" data to ensure continued backwards compatibility), while benchmarking for performance regressions could "only" happen before releasing.
And such benchmarking would be *very* difficult to fit in the CI. Last but not least, the overhead of manually running the Ceph integration/performance tests is likely to be an order of magnitude less over the next five years than the overhead of making sure they can run from the CI without making so much noise (because of environmental failures) that it often requires manual intervention anyways.
We should assume that anything that must happen but is not automated will, eventually, not happen, in a way that's detrimental.
I expect that we will have a persistent "staging" ceph cluster against which we would be able to run these kinds of workloads (or be able to spawn a scratch cluster on kubernetes, or something).
I think we can agree on one thing: there is a need and whatever the solution it is not trivial to implement, unfortunately.
requirements.txt | ||
---|---|---|
9 | The upside of using sh vs subprocess is also that it is significantly less code to do the same. But if you think this advantage is not worth the extra dependency, I'll be happy to do the extra work. | |
swh/objstorage/backends/winery/bench.py | ||
1 ↗ | (On Diff #24859) | Agreed, it was moved and renamed into winery_benchmark.py |
winery-test-environment/build-vms.sh | ||
1 ↗ | (On Diff #24859) | Good catch, added. |
100 ↗ | (On Diff #24859) | Added. |
mypy.ini | ||
---|---|---|
25–27 | No. My bad, removed. | |
swh/objstorage/backends/winery/bench.py | ||
84 ↗ | (On Diff #24859) | Good point. I added +1 to all of them which makes them all non-aligned. Neatly aligned objects are the exception anyways. |
swh/objstorage/backends/winery/database.py | ||
19 | Done in a context manager, more DRY is good. Also changed the connect as suggested. | |
74 | Good catch, fixed. | |
swh/objstorage/backends/winery/objstorage.py | ||
60–61 | The idea is that SharedBase.add_phase_1 either
Which allows it to pick up where the crashed worker left. However reading the code again I saw there was a bug in how the unique violation was handled and fixed it. That being said, the tests do not still cover border cases such as this one and they are what I intend to work on next. They are tricky to get right and I wanted to wait for the first wave over review before getting to them. I think I'm ready now :-) | |
swh/objstorage/backends/winery/roshard.py | ||
21–25 | Done. | |
62–100 | Moved to the PoolHelper class and to the test directory. Dangerous indeed... | |
swh/objstorage/backends/winery/sharedbase.py | ||
29 | There is no meaning, it just has to be unique for the lock to work. I added a comment to clarify that. | |
31–45 | Done! | |
32–37 |
The id matches the shard field in the signature2shard table below to save space. I'll make the name field unique. There will be a small number of rows in this table, is your recommendation for merging the fields about simplicity or saving space? | |
95 | I understood what @vlorentz had in mind but thought having two fields made the whole thing a little clearer. | |
128–143 | This function was exclusively used for testing and indeed confusing because inconsistent. I moved it to the test directory for clarity. |
Build has FAILED
Patch application report for D6796 (id=24888)
Rebasing onto 4238095a6c...
First, rewinding head to replay your work on top of it... Applying: winery: first implementation
Changes applied before test
commit 84e0f76596bacae7404b83dacb3fa254cc981bf4 Author: Loïc Dachary <loic@dachary.org> Date: Fri Dec 10 17:58:49 2021 +0100 winery: first implementation Relates to: T3533
Link to build: https://jenkins.softwareheritage.org/job/DOBJS/job/tests-on-diff/98/
See console output for more information: https://jenkins.softwareheritage.org/job/DOBJS/job/tests-on-diff/98/console
+[mypy-swh.perfecthash.*] +ignore_missing_imports = True
is actually necessary for jenkins to be happy. Not sure why it passes on my machine, that needs to be investigated.
00:50:33 mypy run-test-pre: PYTHONHASHSEED='1622736210' 00:50:33 mypy run-test: commands[0] | mypy swh 00:50:39 swh/objstorage/backends/winery/roshard.py:10: error: Skipping analyzing "swh.perfecthash": module is installed, but missing library stubs or py.typed marker 00:50:39 swh/objstorage/backends/winery/roshard.py:10: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports 00:50:47 Found 1 error in 1 file (checked 54 source files)
Build is green
Patch application report for D6796 (id=24889)
Rebasing onto 4238095a6c...
First, rewinding head to replay your work on top of it... Applying: winery: first implementation
Changes applied before test
commit d2cc4f3b384f001610e91fcbac9a84a75defc900 Author: Loïc Dachary <loic@dachary.org> Date: Fri Dec 10 17:58:49 2021 +0100 winery: first implementation Relates to: T3533
See https://jenkins.softwareheritage.org/job/DOBJS/job/tests-on-diff/99/ for more details.
Build is green
Patch application report for D6796 (id=24890)
Rebasing onto 4238095a6c...
First, rewinding head to replay your work on top of it... Applying: winery: first implementation
Changes applied before test
commit 63c2945249a8454f587e4b27ca2503399dfaad4c Author: Loïc Dachary <loic@dachary.org> Date: Fri Dec 10 17:58:49 2021 +0100 winery: first implementation Relates to: T3533
See https://jenkins.softwareheritage.org/job/DOBJS/job/tests-on-diff/100/ for more details.
Build is green
Patch application report for D6796 (id=24893)
Rebasing onto 4238095a6c...
First, rewinding head to replay your work on top of it... Applying: winery: first implementation
Changes applied before test
commit 9331629aa8ed2039d8ec973621bbab0d4aba0d46 Author: Loïc Dachary <loic@dachary.org> Date: Fri Dec 10 17:58:49 2021 +0100 winery: first implementation Relates to: T3533
See https://jenkins.softwareheritage.org/job/DOBJS/job/tests-on-diff/101/ for more details.
Build is green
Patch application report for D6796 (id=24905)
Rebasing onto 4238095a6c...
First, rewinding head to replay your work on top of it... Applying: winery: first implementation
Changes applied before test
commit d4cf25677a8d8ed02b3ba1a3014150b0de40b033 Author: Loïc Dachary <loic@dachary.org> Date: Fri Dec 10 17:58:49 2021 +0100 winery: first implementation Relates to: T3533
See https://jenkins.softwareheritage.org/job/DOBJS/job/tests-on-diff/102/ for more details.
Build is green
Patch application report for D6796 (id=24906)
Rebasing onto 4238095a6c...
First, rewinding head to replay your work on top of it... Applying: winery: first implementation
Changes applied before test
commit da80b03b8155340a4580d15eac428bbff412ac2a Author: Loïc Dachary <loic@dachary.org> Date: Fri Dec 10 17:58:49 2021 +0100 winery: first implementation Relates to: T3533
See https://jenkins.softwareheritage.org/job/DOBJS/job/tests-on-diff/103/ for more details.
(accepting the diff as myself so I don't block it while on vacation, as you addressed my own comments)
swh/objstorage/backends/winery/sharedbase.py | ||
---|---|---|
95 | I'm not a huge fan of having two booleans to encode a 3-states state machine btw. But it's acceptable as long as you add a CHECK as I mentioned in another comment. |
docs/winery.rst | ||
---|---|---|
11 | because this isn't markdown (ditto below) |
Hum, it looks like I forgot to submit two replies a week ago or so, doing that now.
@vlorentz thanks for the additional review and the kind thought :-) Although I am blocked, I have other things to keep me busy and I fully expected there would be delays this time of year.
docs/winery.rst | ||
---|---|---|
8 | @olasd the high level description that you suggested to add is here. | |
swh/objstorage/backends/winery/objstorage.py | ||
19 | @olasd the WineryObjStorage is split into four classes for clarity. |
Build is green
Patch application report for D6796 (id=25008)
Rebasing onto 4238095a6c...
Current branch diff-target is up to date.
Changes applied before test
commit 9a88cafbc72cea6accbf32194b7b2ba2cd58fb31 Author: Loïc Dachary <loic@dachary.org> Date: Fri Dec 10 17:58:49 2021 +0100 winery: first implementation Relates to: T3533
See https://jenkins.softwareheritage.org/job/DOBJS/job/tests-on-diff/104/ for more details.