Page MenuHomeSoftware Heritage

Adding the http client/server for swh.storage.objstorage
ClosedPublic

Authored by qcampos on Apr 29 2016, 4:17 PM.

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

qcampos retitled this revision from to Adding the http client/server for swh.storage.objstorage.
qcampos updated this object.
qcampos edited the test plan for this revision. (Show Details)
qcampos added a reviewer: zack.
qcampos edited edge metadata.
  • Correcting the imports on init
zack requested changes to this revision.Apr 29 2016, 4:45 PM
zack edited edge metadata.

Please remove swh/storage/objstorage/api/load_balancing_client.py as well as the DB modifications to this diff.
We should land the objstorage api/client first, and then look at the structure for the actual archiver.
Thanks!

This revision now requires changes to proceed.Apr 29 2016, 4:45 PM
qcampos edited edge metadata.
  • Revert "Changing the database schema in order to add archivage tables"
  • Revert the commit 2fdf9a2d6250262458e6bb6a62d1810631970c44

Should be good! There is now only the file -> module transformation for swh.storage.objstorage and the creation of the swh.storage.objstorage.api module.

zack requested changes to this revision.Apr 29 2016, 5:14 PM
zack edited edge metadata.

So, the diff looks good now, but the git log doesn't, because it contains the "right" commits + their "reverts".
We really only want to see the commits pertaining to the objstorage API there.
So you'll need to so some git surgery here… :)

This revision now requires changes to proceed.Apr 29 2016, 5:14 PM
qcampos edited edge metadata.
  • Changing the objstorage file to a module
  • Adding a http client/server for swh.storage.objstorage
  • Adding tests for the remote storage API
qcampos edited edge metadata.
zack requested changes to this revision.May 2 2016, 12:32 PM
zack edited edge metadata.

Thanks, the implemented stuff looks good.
On the other hand in various places there are copy/paste from storage (as opposed to objstorage) that we should really avoid.
Instead, we should refactor the underlying code and have it used by both storage and objstorage.

swh/storage/objstorage/api/client.py
16–35 ↗(On Diff #54)

This seems copy/pasted from storage/api/client.py.
We should avoid that and rather refactor a common sub-module of swh-storage and have it used by both storage and objstorage clients.

swh/storage/objstorage/api/server.py
16 ↗(On Diff #54)

let's make this /tmp/swh-storage/objects/

18 ↗(On Diff #54)

I noticed that this setting (the port) is not the storage server default config. Any reason to have it here?

If not, please remove it for uniformity.

31–59 ↗(On Diff #54)

All this seems copy/pasted from storage/api/server.py, we should avoid that.
This should be moved to a common module within swh-storage that is used by both storage and objstorage.

swh/storage/tests/test_objstorage_api.py
25 ↗(On Diff #54)

this should be TestRemoteObjStorage

39–78 ↗(On Diff #54)

this is copy/paste from test_api_client. We shouldn't do that.
Instead, we should write a testing helper mixin that implements this logic, and use it both in test_api_client and this new test module

This revision now requires changes to proceed.May 2 2016, 12:32 PM
qcampos edited edge metadata.
qcampos updated this object.
  • Refactor storage and objstorage api's module in order to factorize some methods
  • Change the defaut config of the object storage server
  • Refactor the testing of the storage & objstorage API in order to factorize the background server launching
zack removed a reviewer: zack.
ardumont added inline comments.
swh/storage/tests/server_testing.py
18 ↗(On Diff #55)

Aren't setUp and tearDown methods defined in unittest.TestCase?
Your class does not inherit from it though.

And since you propose it, this must work.
But, i think, as a side-effect since TestRemoteStorage is a mixin which is a subclass of this class and of unittest.TestCase.

It could be better to make this a subclass of unittest.TestCase and remove it from TestRemoteStorage, what do you think?

zack added inline comments.
swh/storage/tests/server_testing.py
18 ↗(On Diff #55)

It could be better to make this a subclass of unittest.TestCase and remove it from TestRemoteStorage, what do you think?

That is part of how mixins actually work. You don't need (nor usually want, to avoid multiple inheritance issues) to inherit directly from the class you extend. You just document the "contract" that is required to make what the mixin does work. If you look at db_testing.py in swh-core, it works exactly the same way.

swh/storage/tests/server_testing.py
18 ↗(On Diff #55)

Well, Interesting, as usual thanks for the hint.

You just document the "contract" that is required to make what the mixin does work.

Ok, but somehow at the expanse of the actual initial class's clarity (here ServerTestFixture).
Some docstring explaining that this class is to be used with mixin seems like a good idea to compensate this?

If you look at db_testing.py in swh-core, it works exactly the same way.

Thanks for the hint.
I'll take a look at db_testing.py again :D

swh/storage/tests/server_testing.py
18 ↗(On Diff #55)

Ok, but somehow at the expanse of the actual initial class's clarity (here ServerTestFixture).
Some docstring explaining that this class is to be used with mixin seems like a good idea to compensate this?

Yes, absolutely.

I *think* db_testing.py does that, but it needs to be verified as well.

swh/storage/tests/server_testing.py
18 ↗(On Diff #55)

Yes, absolutely.

I *think* db_testing.py does that, but it needs to be verified as well.

Indeed
https://forge.softwareheritage.org/diffusion/DCORE/browse/master/swh/core/tests/db_testing.py;5a2ef92d45246d1195f19ac8e76b19182b193359$85

olasd requested changes to this revision.May 3 2016, 3:25 PM
olasd added a reviewer: olasd.
olasd added a subscriber: olasd.

We're deeply in nitpick territory now and this is probably mergeable as is.

I think the separate utils module name is too generic, as those methods only concerns the internal APIs provided by storage/objstorage. api.common (swh/storage/api/common.py) or common_api (swh/storage/common_api.py) would make more sense to me. The API term is getting very overloaded. :/

Generally, using relative imports for module-internal imports is good practice as it makes refactoring/renaming easier. You don't need to consider that a blocker though, and consistency is what's most important.

swh/storage/api/client.py
12–13 ↗(On Diff #55)

You can avoid the backslash by using parentheses here:

from swh.storage.utils import (
    decode_response, encode_data_client as encode_data,
)

You can also use a relative import which will avoid issues if we rename the storage module one day (we won't, but it's still good practice):

from ..utils import decode_response, encode_data_client as encode_data
swh/storage/objstorage/api/client.py
12–14 ↗(On Diff #55)

Same comment about relative imports and the backslash.

swh/storage/objstorage/objstorage.py
13

Please keep the relative imports if possible.

swh/storage/tests/server_testing.py
18 ↗(On Diff #55)

@qcampos: you should then add a comment saying the class is a mixin in the docstring

swh/storage/tests/test_api_client.py
25–32 ↗(On Diff #55)

When you call super().method(), the method resolution order is deterministic and uses C3 linearization. You can inspect the MRO by using the __mro__ magic attribute on your classes.

Base classes are supposedly used in order, so you should get (TestRemoteStorage, AbstractTestStorage, ServerTestFixture, unittest.TestCase, object). That means that once you call super().setUp() the attribute should properly be set.

This revision now requires changes to proceed.May 3 2016, 3:25 PM
In D10#333, @olasd wrote:

Generally, using relative imports for module-internal imports is good practice as it makes refactoring/renaming easier. You don't need to consider that a blocker though, and consistency is what's most important.

At what point should I use relative imports ? Is there a good policy about the number of dots that are reasonable ?
Should I use them each time its into swh.* ?

swh/storage/tests/test_api_client.py
25–32 ↗(On Diff #55)

In lines 34-35, I create a dict that contains self.objroot (storage_base here), which is initialized in AbstractTestStorage's super().setUp()
And I need this dict in ServerTestFixture's super().setUp()

So, the attribute is set, but is there a way to fill the dictionary between the two supercalls, or should I think to another behavior for ServerTestFixture's arguments ?

In D10#354, @qcampos wrote:
In D10#333, @olasd wrote:

Generally, using relative imports for module-internal imports is good practice as it makes refactoring/renaming easier. You don't need to consider that a blocker though, and consistency is what's most important.

At what point should I use relative imports ? Is there a good policy about the number of dots that are reasonable ?
Should I use them each time its into swh.* ?

Basically you should use relative imports for the stuff that lives in the same git repository / "python project".

For instance, if you are in swh.storage.bar, you should import swh.storage.objstorage.foo using a relative import, but import swh.core.hashutil with an absolute, fully qualified import.

In D10#356, @olasd wrote:

Basically you should use relative imports for the stuff that lives in the same git repository / "python project".

For instance, if you are in swh.storage.bar, you should import swh.storage.objstorage.foo using a relative import, but import swh.core.hashutil with an absolute, fully qualified import.

Is it as well alright to use a relative import in swh.storage.objstorage.foo in order to import swh.storage.bar ? I guess it's import ...bar.

Edit : It seems that relative imports does not work in tests. Am I doing it wrong or is it the case ?

qcampos edited edge metadata.
  • Change the objstorage file to a module
  • Add an http API for objstorage and refactor the shared code with storage.api
  • Add tests for the remote objstorage API and share some code with storage.api tests
qcampos marked 3 inline comments as done.
qcampos edited edge metadata.
  • Change the objstorage file to a module
  • Add an http API for objstorage and refactor the shared code with storage.api
  • Add tests for the remote objstorage API and share some code with storage.api tests

I missed an import in swh/storage/objstorage/objstorage.py that needed to be changed to a relative one.

olasd edited edge metadata.

Thanks @qcampos! We can always iterate further, but this is good to merge :)

This revision is now accepted and ready to land.May 4 2016, 1:37 PM
This revision was automatically updated to reflect the committed changes.
zack changed the visibility from "All Users" to "Public (No Login Required)".