Change the objstorage single file into a module in order to add a swh.storage.objstorage.api submodule.
The created submodule allows the creation of a remote api for the object storage.
Details
- Reviewers
olasd - Group Reviewers
Reviewers Developers - Maniphest Tasks
- T381: HTTP client/server version of swh.storage.objstorage
- Commits
- rDSTOCa3a7bd4d9f11: Add an http API for objstorage and refactor the shared code with storage.api
rDSTOC82b0e8a155c7: Change the objstorage file to a module
rDSTOCf0bda087bbb4: Add tests for the remote objstorage API and share some code with storage.api…
R65:a3a7bd4d9f11: Add an http API for objstorage and refactor the shared code with storage.api
R65:f0bda087bbb4: Add tests for the remote objstorage API and share some code with storage.api…
R65:82b0e8a155c7: Change the objstorage file to a module
rDSTOa3a7bd4d9f11: Add an http API for objstorage and refactor the shared code with storage.api
rDSTOf0bda087bbb4: Add tests for the remote objstorage API and share some code with storage.api…
rDSTO82b0e8a155c7: Change the objstorage file to a module
Test included
Diff Detail
- Repository
- rDSTO Storage manager
- Branch
- master
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 72 Build 95: Software Heritage Python tests Build 94: arc lint + arc unit
Event Timeline
Please remove swh/storage/objstorage/api/load_balancing_client.py as well as the DB modifications to this diff.
We should land the objstorage api/client first, and then look at the structure for the actual archiver.
Thanks!
- Revert "Changing the database schema in order to add archivage tables"
- Revert the commit 2fdf9a2d6250262458e6bb6a62d1810631970c44
Should be good! There is now only the file -> module transformation for swh.storage.objstorage and the creation of the swh.storage.objstorage.api module.
So, the diff looks good now, but the git log doesn't, because it contains the "right" commits + their "reverts".
We really only want to see the commits pertaining to the objstorage API there.
So you'll need to so some git surgery here… :)
- Changing the objstorage file to a module
- Adding a http client/server for swh.storage.objstorage
- Adding tests for the remote storage API
Thanks, the implemented stuff looks good.
On the other hand in various places there are copy/paste from storage (as opposed to objstorage) that we should really avoid.
Instead, we should refactor the underlying code and have it used by both storage and objstorage.
swh/storage/objstorage/api/client.py | ||
---|---|---|
16–35 | This seems copy/pasted from storage/api/client.py. | |
swh/storage/objstorage/api/server.py | ||
16 | let's make this /tmp/swh-storage/objects/ | |
18 | I noticed that this setting (the port) is not the storage server default config. Any reason to have it here? If not, please remove it for uniformity. | |
31–59 | All this seems copy/pasted from storage/api/server.py, we should avoid that. | |
swh/storage/tests/test_objstorage_api.py | ||
25 | this should be TestRemoteObjStorage | |
39–78 | this is copy/paste from test_api_client. We shouldn't do that. |
- Refactor storage and objstorage api's module in order to factorize some methods
- Change the defaut config of the object storage server
- Refactor the testing of the storage & objstorage API in order to factorize the background server launching
swh/storage/tests/server_testing.py | ||
---|---|---|
18 ↗ | (On Diff #55) | Aren't setUp and tearDown methods defined in unittest.TestCase? And since you propose it, this must work. It could be better to make this a subclass of unittest.TestCase and remove it from TestRemoteStorage, what do you think? |
swh/storage/tests/server_testing.py | ||
---|---|---|
18 ↗ | (On Diff #55) |
That is part of how mixins actually work. You don't need (nor usually want, to avoid multiple inheritance issues) to inherit directly from the class you extend. You just document the "contract" that is required to make what the mixin does work. If you look at db_testing.py in swh-core, it works exactly the same way. |
swh/storage/tests/server_testing.py | ||
---|---|---|
18 ↗ | (On Diff #55) | Well, Interesting, as usual thanks for the hint.
Ok, but somehow at the expanse of the actual initial class's clarity (here ServerTestFixture).
Thanks for the hint. |
swh/storage/tests/server_testing.py | ||
---|---|---|
18 ↗ | (On Diff #55) |
Yes, absolutely. I *think* db_testing.py does that, but it needs to be verified as well. |
swh/storage/tests/server_testing.py | ||
---|---|---|
18 ↗ | (On Diff #55) |
|
We're deeply in nitpick territory now and this is probably mergeable as is.
I think the separate utils module name is too generic, as those methods only concerns the internal APIs provided by storage/objstorage. api.common (swh/storage/api/common.py) or common_api (swh/storage/common_api.py) would make more sense to me. The API term is getting very overloaded. :/
Generally, using relative imports for module-internal imports is good practice as it makes refactoring/renaming easier. You don't need to consider that a blocker though, and consistency is what's most important.
swh/storage/api/client.py | ||
---|---|---|
12–13 ↗ | (On Diff #55) | You can avoid the backslash by using parentheses here: from swh.storage.utils import ( decode_response, encode_data_client as encode_data, ) You can also use a relative import which will avoid issues if we rename the storage module one day (we won't, but it's still good practice): from ..utils import decode_response, encode_data_client as encode_data |
swh/storage/objstorage/api/client.py | ||
12–14 | Same comment about relative imports and the backslash. | |
swh/storage/objstorage/objstorage.py | ||
13 | Please keep the relative imports if possible. | |
swh/storage/tests/server_testing.py | ||
18 ↗ | (On Diff #55) | @qcampos: you should then add a comment saying the class is a mixin in the docstring |
swh/storage/tests/test_api_client.py | ||
25–32 ↗ | (On Diff #55) | When you call super().method(), the method resolution order is deterministic and uses C3 linearization. You can inspect the MRO by using the __mro__ magic attribute on your classes. Base classes are supposedly used in order, so you should get (TestRemoteStorage, AbstractTestStorage, ServerTestFixture, unittest.TestCase, object). That means that once you call super().setUp() the attribute should properly be set. |
At what point should I use relative imports ? Is there a good policy about the number of dots that are reasonable ?
Should I use them each time its into swh.* ?
swh/storage/tests/test_api_client.py | ||
---|---|---|
25–32 ↗ | (On Diff #55) | In lines 34-35, I create a dict that contains self.objroot (storage_base here), which is initialized in AbstractTestStorage's super().setUp() So, the attribute is set, but is there a way to fill the dictionary between the two supercalls, or should I think to another behavior for ServerTestFixture's arguments ? |
Basically you should use relative imports for the stuff that lives in the same git repository / "python project".
For instance, if you are in swh.storage.bar, you should import swh.storage.objstorage.foo using a relative import, but import swh.core.hashutil with an absolute, fully qualified import.
Is it as well alright to use a relative import in swh.storage.objstorage.foo in order to import swh.storage.bar ? I guess it's import ...bar.
Edit : It seems that relative imports does not work in tests. Am I doing it wrong or is it the case ?
- Change the objstorage file to a module
- Add an http API for objstorage and refactor the shared code with storage.api
- Add tests for the remote objstorage API and share some code with storage.api tests
- Change the objstorage file to a module
- Add an http API for objstorage and refactor the shared code with storage.api
- Add tests for the remote objstorage API and share some code with storage.api tests
I missed an import in swh/storage/objstorage/objstorage.py that needed to be changed to a relative one.