Details

Reviewers

ardumont
douardda

Group Reviewers

Reviewers

Commits

rDSTOCd190fda66316: Implement content_get and content_get_range for the in-mem storage.
R65:d190fda66316: Implement content_get and content_get_range for the in-mem storage.
rDSTOd190fda66316: Implement content_get and content_get_range for the in-mem storage.

Summary

Needed to work on T1307 for the indexers.

Diff Detail

Repository

rDSTO Storage manager

Branch

inmem-content-get-range

Lint

No Linters Available

Unit

No Unit Test Coverage

Build Status

Buildable 2525
Build 3132: tox-on-jenkins	Jenkins
Build 3131: arc lint + arc unit

Event Timeline

vlorentz created this revision.Nov 21 2018, 6:32 PM

Herald added a reviewer: Reviewers. · View Herald TranscriptNov 21 2018, 6:32 PM

vlorentz edited the summary of this revision. (Show Details)Nov 21 2018, 6:33 PM

remove irrelevant fixme.

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/106/ for more details.

Harbormaster completed remote builds in B2516: Diff 2176.Nov 21 2018, 6:42 PM

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/107/ for more details.

Harbormaster completed remote builds in B2517: Diff 2177.Nov 21 2018, 6:47 PM

Some question, docstring improvments, formatting alignment are the requirement changes.

Other that that, this seems ready to roll.

Thanks.

swh/storage/in_memory.py
87	Why? A comment would help.
101	Dict[bytes, bytes]?
103	sha1 (bytes): content id
104	data (bytes): content's raw data
111	remove irrelevant fixme. You mean irrelevant for the in-memory usage (test) i guess. Explaining why i ask. For the initial storage, as we starting developing paginated endpoints, i'm wondering if it would not be better to align the endpoint as well. If it's possible that is (i'm not sure we enforce an order for example). That would be out of scope for that diff ;) The question might be worth discussing (the conclusion could also be to drop the conditional altogether ;).
125	Can you please clarify what you mean by 'enforced with multiplicity'?
127	Can't we skip those? Pushing filtering on the endpoint consumer feels bad to me. (and if in the main storage there is such a case, that's an out of scope issue, an issue nonetheless).
155	I never said it before. We are growing inconsistent with the formatting. That seems like nothing but that can end up doing some irrelevant noise in commits. So this should be avoided. Naturally, that's up for discussion but in the mean time, that'd be great to align to current conventions. So. in this case, ending character aligned at the beginning of the initial keyword. return { 'contents': matched, 'next': sha1, } As per tool, that'd be great not to bother with this and have some tool doing that for us. @zack mentioned black 'recently' but we never (re)acted on that matter...

This revision now requires changes to proceed.Nov 22 2018, 10:06 AM

douardda requested changes to this revision.Nov 22 2018, 10:44 AM

douardda added a subscriber: douardda.

douardda added inline comments.

swh/storage/in_memory.py

146–163

Same as a previous comment, prefer a single return statement when possible/not too obfuscating. Here, something like:

from itertools import chain, islice
next = None,
contents = []
sha1s = islice(self._sorted_sha1s, 0, from_index)
it_contents = (((sha1, v) for v in self._content_indexes['sha1'][sha1])
            for sha1 in sha1s)

for sha1, key in islice(chain.from_iterable(it_contents), limit):
    if sha1 > end:
        break
    contents.append({
            'data': self._contents_data[key],
            **self._contents[key],
            })
else:
    next = sha1
return dict(next=next, contents=contents)

vlorentz marked 9 inline comments as done.Nov 22 2018, 12:01 PM

vlorentz added inline comments.

swh/storage/in_memory.py
87	Why what? It just inserts the new sha1 in the list
111	nvm, I misunderstood that comment
127	That sentence was wrong, `content_get_range` returns no `None` item.
146–163	You can't do that, because it assigns the last returned value to `next`, instead of the first non-returned one. As a consequence, `UnboundLocalError` when there is no result because the `for` loop does not run. So the `for` loop needs to be rewritten like this: next = None for sha1, key in chain.from_iterable(it_contents): if sha1 > end: break if len(contents) >= limit: next = sha1 break contents.append({ 'data': self._contents_data[key], self._contents[key], }) return dict(next=next, contents=contents) Which I find slightly less readable than this: for sha1, key in chain.from_iterable(it_contents): if sha1 > end: break if len(contents) >= limit: return dict(next=sha1, contents=contents) contents.append({ 'data': self._contents_data[key], self._contents[key], }) return dict(next=None, contents=contents) because the former uses an extra `next` variable to do the same thing (with a "break then immediately return" instead of just a return).

Apply @ardumont's comments.

Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/111/ for more details.

Harbormaster completed remote builds in B2522: Diff 2182.Nov 22 2018, 12:07 PM

ardumont added inline comments.Nov 22 2018, 12:46 PM

swh/storage/in_memory.py
146–163	Yes, i was wondering whether what david proposed would work. Pick either one of the last one you proposed. They sound both fine to me.
swh/storage/storage.py
231–232	[bytes, bytes]?

douardda added inline comments.Nov 22 2018, 1:27 PM

swh/storage/in_memory.py
146–163	Indeed I changed the behavior in my snippet, but a small fix could be: for sha1, key in islice(chain.from_iterable(it_contents), limit+1): #[...] return dict(next=next, contents=contents[:limit]) would do the trick, but again, it's readability is debatable. Whatever, I still don't like your return "hidden" in the middle of the loop, but I won't fight.

ardumont added inline comments.Nov 22 2018, 1:39 PM

swh/storage/in_memory.py
87	You change something in the function content_add. Your diff is about content_get and content_get_range. It's not immediately apparent what this does. Now it's clearer ;)
101	Why is the key str? What did i miss?
111	Since you touch the comment, you can now fix it to make it clearer then ;)

vlorentz marked 3 inline comments as done.Nov 22 2018, 1:45 PM

vlorentz added inline comments.

swh/storage/in_memory.py
87	Yeah, I added `self._sorted_sha1` to make searches run in logtime.
101	Keys are `'sha1'` and `'data'`.
swh/storage/storage.py
231–232	No, keys are `'sha1'` and `'data'`.