Page MenuHomeSoftware Heritage

api: implement a new /content endpoint that stream the whole content
ClosedPublic

Authored by douardda on Mar 19 2019, 12:30 PM.

Details

Summary

this stacks a pile of revisions (one of them should be extracted from this
sandwich, but it quite painful a task):

  • refactor the SWHRemoteClient so it does not inherit from ObjStorage nor SWHRemoteAPI and implement the remote api accessor as a _proxy attribute instead of inheriting the class to make class instanciation easier and prevent name collision.
  • Also get rid of the ObjStorage inheritance which did not help and add instanciation (argument passing) complexity. This also solve the 'allow_delete' attribute presence which makes no sense for the remote api component.
  • upgrade the server implementation to aiohttp 3 <== this one
  • objstorage: implement the list_content method for most of the storage backends and add a dedicated test.
  • api: implement the list_content method on RemoteObjStorage and update the server's part accordingly.
  • tests: add a test for the ObjStorage.list_content() including testing for pagination (limit and last_obj_id arguments).

    Note: skip the test_list_content for the StripingObjStorage since we have no way to implement pagination support for the list_content() method in this backend.
  • Make iteration over a ObjStorage sorted to be able to have a generic implementation of pagination for the list_content() method.
  • Add a DEFAULT_LIMIT global value for the limit argument of list_content() and use it.
  • pathslicer: implement a better list_content method that prune unnecessaty directories when walking the root path. Add a dedicated unit test.

Diff Detail

Repository
rDOBJS Object storage
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

LGTM, but I'm not confident enough to accept this Diff myself

swh/objstorage/api/client.py
45–46

What is this new function?

63–64

server

swh/objstorage/multiplexer/multiplexer_objstorage.py
213–216 ↗(On Diff #4007)

Why do you define a function for this?

swh/objstorage/tests/test_objstorage_striping.py
83 ↗(On Diff #4007)

:D

challenge

ardumont added inline comments.
swh/objstorage/api/client.py
45–46

It's not a new function, it's for restoring a content probably detected upstream as inconsistent with its id.

swh/objstorage/api/client.py
45–46

it's not a new function

as in it existed before and i don't know why it disappeared.


Not that i also do not know if it has been used before... as well.
I know we have content checkers somewhere but i don't think they ever ran...
so might be it was yagni and got deleted...

swh/objstorage/api/client.py
26

Prefer collaboration over inheritance! Nice!

swh/objstorage/cloud/objstorage_azure.py
225 ↗(On Diff #4007)

_ instead of k?

that explicits we do not care about the k (except for sorting afaict).

We may want to explicit that there is an order enforced here.

swh/objstorage/multiplexer/multiplexer_objstorage.py
213–216 ↗(On Diff #4007)

I second that, why not inline the for?

douardda added inline comments.
swh/objstorage/api/client.py
45–46

It did not disappeared, it has to be reimplemented since we do not inherit from ObjStorage any more.

swh/objstorage/multiplexer/multiplexer_objstorage.py
213–216 ↗(On Diff #4007)

That are pretty good questions. No idea!

Backporting aiohttp 3 needs:

  • backporting yarl
  • backporting multidict
    • which itself means backporting cython
      • which itself means backporting numpy
        • which needs a newer cython than what's in stable.

I didn't go further than this (e.g. patching multidict to not need the newer cython), but I'd rather we avoid doing that entirely...

In D1274#27491, @olasd wrote:

Backporting aiohttp 3 needs:

  • backporting yarl
  • backporting multidict
    • which itself means backporting cython
      • which itself means backporting numpy
        • which needs a newer cython than what's in stable.

I didn't go further than this (e.g. patching multidict to not need the newer cython), but I'd rather we avoid doing that entirely...

Looks like I managed to break the loop by not running the tests during the cython build. I'm now down the rabbit hole of backporting a recent enough pytest for multidict.

Many fixes, folded several revisions

Provided the commit message is clearer :D

swh/objstorage/tests/test_objstorage_striping.py
83 ↗(On Diff #4102)

It's actually easy now, just list all backends with the same parameters/limits, sort, uniq, limit again, and return that :P

This revision is now accepted and ready to land.Mar 27 2019, 11:34 AM