Needed to work on T1307 for the indexers.
Details
- Reviewers
ardumont douardda - Group Reviewers
Reviewers - Commits
- rDSTOCd190fda66316: Implement content_get and content_get_range for the in-mem storage.
R65:d190fda66316: Implement content_get and content_get_range for the in-mem storage.
rDSTOd190fda66316: Implement content_get and content_get_range for the in-mem storage.
Diff Detail
- Repository
- rDSTO Storage manager
- Branch
- inmem-content-get-range
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 2522 Build 3127: tox-on-jenkins Jenkins Build 3126: arc lint + arc unit
Event Timeline
Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/106/ for more details.
Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/107/ for more details.
Some question, docstring improvments, formatting alignment are the requirement changes.
Other that that, this seems ready to roll.
Thanks.
swh/storage/in_memory.py | ||
---|---|---|
87 | Why? | |
101 | Dict[bytes, bytes]? | |
103 | sha1 (bytes): content id | |
104 | data (bytes): content's raw data | |
111 |
You mean irrelevant for the in-memory usage (test) i guess. Explaining why i ask. That would be out of scope for that diff ;) | |
125 | Can you please clarify what you mean by 'enforced with multiplicity'? | |
127 | Can't we skip those? Pushing filtering on the endpoint consumer feels bad to me. (and if in the main storage there is such a case, that's an out of scope issue, an issue nonetheless). | |
155 | I never said it before. So this should be avoided. Naturally, that's up for discussion but in the mean time, that'd be great to align to current conventions. return { 'contents': matched, 'next': sha1, } As per tool, that'd be great not to bother with this and have some tool doing that for us. |
swh/storage/in_memory.py | ||
---|---|---|
146–163 | Same as a previous comment, prefer a single return statement when possible/not too obfuscating. Here, something like: from itertools import chain, islice next = None, contents = [] sha1s = islice(self._sorted_sha1s, 0, from_index) it_contents = (((sha1, v) for v in self._content_indexes['sha1'][sha1]) for sha1 in sha1s) for sha1, key in islice(chain.from_iterable(it_contents), limit): if sha1 > end: break contents.append({ 'data': self._contents_data[key], **self._contents[key], }) else: next = sha1 return dict(next=next, contents=contents) |
swh/storage/in_memory.py | ||
---|---|---|
87 | Why what? It just inserts the new sha1 in the list | |
111 | nvm, I misunderstood that comment | |
127 | That sentence was wrong, content_get_range returns no None item. | |
146–163 | You can't do that, because it assigns the last returned value to next, instead of the first non-returned one. As a consequence, UnboundLocalError when there is no result because the for loop does not run. So the for loop needs to be rewritten like this: next = None for sha1, key in chain.from_iterable(it_contents): if sha1 > end: break if len(contents) >= limit: next = sha1 break contents.append({ 'data': self._contents_data[key], **self._contents[key], }) return dict(next=next, contents=contents) Which I find slightly less readable than this: for sha1, key in chain.from_iterable(it_contents): if sha1 > end: break if len(contents) >= limit: return dict(next=sha1, contents=contents) contents.append({ 'data': self._contents_data[key], **self._contents[key], }) return dict(next=None, contents=contents) because the former uses an extra next variable to do the same thing (with a "break then immediately return" instead of just a return). |
Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/111/ for more details.
swh/storage/in_memory.py | ||
---|---|---|
146–163 | Indeed I changed the behavior in my snippet, but a small fix could be: for sha1, key in islice(chain.from_iterable(it_contents), limit+1): #[...] return dict(next=next, contents=contents[:limit]) would do the trick, but again, it's readability is debatable. Whatever, I still don't like your return "hidden" in the middle of the loop, but I won't fight. |
swh/storage/in_memory.py | ||
---|---|---|
87 | You change something in the function content_add. Now it's clearer ;) | |
101 | Why is the key str? What did i miss? | |
111 | Since you touch the comment, you can now fix it to make it clearer then ;) |
Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/113/ for more details.
Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/120/ for more details.