This implementation allows the use of an object storage from a cloud
provider (atm Amazon's S3 and OpenStack object are implemented as first
example).
Details
- Reviewers
olasd zack - Group Reviewers
Reviewers - Maniphest Tasks
- T454: object storage backend that can read from/write to S3
- Commits
- rDOBJS553c299916e4: Add first example of cloud object storages
rDOBJSbd574b6982f2: Cloud-based objstorage using Apache Libcloud
This diff is submitted for first code review. Tests are still
needed and will be done once a cloud emulator will be deployed.
Diff Detail
- Repository
- rDOBJS Object storage
- Branch
- T454
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 356 Build 528: arc lint + arc unit
Event Timeline
Looks like a good start! I've noted down some comments for further discussion, has we discussed F2F last Friday.
swh/objstorage/cloud/objstorage_cloud.py | ||
---|---|---|
1 | nitpick: as this is new code written in 2016, you can drop "2015-" from here | |
15 | as this is an abstract class not meant to be implemented, shouldn't it use the abc machinery? | |
18 | so, is this really a "should" or a "shall"? does this class do anything useful if not subclassed or not? | |
19–20 | "correct object" is not very useful doc for people looking into how to implement a specific concrete subclass of this abstract class. | |
49 | a (even just minimal) docstring here would be nice | |
58–59 | The warning is useful, but should probably explain in what it would be inefficient. Is it just because it's a ping-pong with remote storage over the network, or is it something worse, like in-memory caching of the whole list? A (minimal) docstring here would be nice too. | |
64–65 | d'oh :( if there is really no direct method to retrieve the size of the storage, I believe we should either not implement this method, or equip it with the same warning that goes with iter, otherwise it might be a timebomb | |
68–79 | This and a lot of other docstrings in this module seem to be copy-paste from objstorage.py, which would make them a pain to maintain. | |
138–139 | It is actually true that integrity is guaranteed by the cloud in the sense we want? We want that the key of an object, in the way we compute it matches the associate data. Is that guarantee part of libcloud contract with all its backend implementation?
|
swh/objstorage/cloud/objstorage_cloud.py | ||
---|---|---|
64–65 | It doesn't seems a method to compute the size of a storage or a container exists. I don't know if we will need to list the full content of an objstorage, or get its size ; if so, I can search for a better implementation we may use in production. | |
138–139 | Oh, right. I was thinking about some bite-rot corruption, not id mismatching the content. We do compute the hash ourselves (if not provided) and use it as a key for the storage. Libcloud doesn't do anything special (Although I guess the cloud behind will do whatever it needs with a hash to store the data) it will be retrievable with the key we provided). As a note, the _get_object returns a Libcloud Object that is just a pointer to the object and does not retrieve the content unless a download or as_stream method is used (For what I've seen, Object.as_stream() delegates to driver.download_as_stream(object, ...). |
swh/objstorage/cloud/objstorage_cloud.py | ||
---|---|---|
15 | That feature is awesome! Is there any don't use-case for this? Because it could be used in a lot of classes in the swh.objstorgae module. |
Looks good!
I've just noted down a nit about docstrings in subclasses/overridden methods that add nothing w.r.t. the parent class, as they're effectively useless.
swh/objstorage/cloud/objstorage_cloud.py | ||
---|---|---|
177–178 | The diff looks generally good. I've just a remaining nitpick about the docstrings that add nothing w.r.t. the parent class. Can you remove all of them, unless they add something that is specific to the subclass / method override? |
No changes, but current diff needs to be updated as the parent commit as changed due to a rebase in top of previously pushed diff.
swh/objstorage/cloud/objstorage_cloud.py | ||
---|---|---|
105 | Typo in parameter's name: chech_presence -> check_presence. |