Land the two parent diffs, then rebase this one on top of them.
Details
- Reviewers
douardda - Group Reviewers
Reviewers - Commits
- rDSTOCb61cf3d4cb4b: Add in-memory storage.
R65:b61cf3d4cb4b: Add in-memory storage.
rDSTOb61cf3d4cb4b: Add in-memory storage.
Diff Detail
- Repository
- rDSTO Storage manager
- Branch
- in-mem
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 2429 Build 2981: tox-on-jenkins Jenkins Build 2980: arc lint + arc unit
Event Timeline
swh/storage/in_memory.py | ||
---|---|---|
58–62 | forget it I misread the condition. Nonetheless I don't understand the reason for this type check, nor I get the true utility of this method. eg. at line 598 below we can read: visit_id = self._origin_visit_key({'origin': origin, 'date': visit_date}) I prefer a simple: visit_id = OriginVisitKey(origin, visit_date) Simple, readable, no useless indirection | |
492–500 | wouldn't be better to create the snapshot dict only in the if block? | |
554–558 | isn't return self.snapshot_get(self._origin_visits[visit]['snapshot']) enough here? | |
589–590 | I find this a bit easier to read, but might not be the case for everyone: list(itertools.chain(*self._origins[origin]['visits_dates'].values())) | |
593–595 | why not use itertools also in here? |
- Move protected methods at the end of the class.
- Don't make the tool config part of the key.
- key tuples.
- no temp dict
- Create snap dict only in the if block.
- itertools.chain
- tool_add is a generator.
swh/storage/in_memory.py | ||
---|---|---|
58–62 | The goal is to check that visit['origin'] is an origin's key, not the origin's data. | |
68–71 | I don't know why I did that. | |
84–87 | Even though it doesn't affect this particular backend, I believe it should remain here for users of the API to be backend-agnostic. | |
134 | @olasd ? | |
311–312 | Do we care about the efficiency of this backend? | |
349–352 | Oops. | |
413–414 | Either way is fine with me | |
440 | TIL | |
554–558 | "Explicit is better than implicit." But I can change it if you prefer. | |
973 | I use *_key methods so all key computation is at the same place, and keys can be used as a black box by other methods (eg. can be turned into hashes later). | |
1008–1036 | The old storage's docstrings said it returned an iterable, I didn't notice its implementation didn't match. |
I'll work on your other comments after D642 lands; it's a bit hard to test without it.
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tox/58/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tox/58/console
swh/storage/in_memory.py | ||
---|---|---|
32 |
I don't understand. The public API is the set of methods that don't have an underscore prefix, and they all have a docstring. | |
34–47 | I have another diff in the waiting that does this; but it's quite a big one. | |
134 | Note: these comments used to be on this line: key = random.sample(objs, 1)[0] | |
149–163 |
No, this implementation only returns one; it does a set intersection of all the matching keys, and returns one item of the set.
Actually, that was @olasd 's FIXME. (Note: these comments used to be on content_find.) |
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tox/63/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tox/63/console
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tox/71/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tox/71/console
swh/storage/in_memory.py | ||
---|---|---|
68–71 | I think we've done it in the past (e.g. when we were tuning the config for a given tool), so yeah I think it's useful. | |
134 | (the "view on previous revision" button - that looks like a big rewind button - makes these comments useless) | |
134 | In the future we could have several objects with the same sha1. The current API "contract" currently expects us to return only one of these objects. | |
149–163 | Same as the previous remark for content_get_metadata, we could have several objects that match all the hashes we were given as input. So, formally, this method should return a list of all the matches. | |
311–312 | This method should probably be a client-side algorithm (that can itself do something less stack-heavy than recursion) rather than a server-side recursion anyway. |
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tox/79/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tox/79/console
Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/81/ for more details.
swh/storage/in_memory.py | ||
---|---|---|
32 | My point was to ensure all those 'public API' (as defined as methods nto starting by a single _) are really "public API" and, is so, ensure that is clear soehow (hence the "documented" part of teh comment). Now, I don't fully get this result https://forge.softwareheritage.org/P333 | |
34–47 | I don't understand why this has to be done in a different diff or set of revisions, but I won't veto the diff for this. | |
58–62 | But who are you trying to protect here? from who? It's a private method, and it's the only one in which you check the argument type. Once again, It's not big deal, it's just it looks to me some kind of (tiny) over engineering. | |
68–71 | I would love some more explanation then, cause I don't get the point (not in the code, an IRL discussion is fine for me there). | |
84–87 | I disagree. The "Note: in case of DB errors [...]" is in fact not an API spec, it an implementation detail of the db backend. We do not have a abstract base class where the API is documented, but that does not imply that the current only implementation (DB) IS the API, it's the API plus an implementation. So there is no need to tell irrelevant things on other implementations of the API. | |
134 | There should be a comment stating this, then. Otherwise, reading this line is pretty frightening. And explain why random is better than [0] there... | |
149–163 |
Sorry, I missed this intersection step. | |
149–163 |
yes, it would be the way to go | |
311–312 | We do care about universe's entropy 😄 | |
554–558 |
yes but smaller is better; easier to read, understand and maintain (if it does not obfuscate the statement) |
swh/storage/in_memory.py | ||
---|---|---|
830 | I'd prefer to use the exact same code as storage.py when possible (would make a potential refactoring/factorization easier) when possible, eg. here. unless I'm wrong you could write: return { 'origin': origin, #... | |
876 | [0:limit] ? really? get rid of this useless and unpythonic 0 please 😈 | |
877 | I'm repeating myself, but I really find this _origin_visit_key useless and adding unnecessary indirection. It's in fact used 3 times in this class, 2 of them in which you create a dict especially for calling the method. Once again, it's a detail in view of the amount of code involved, but it appears to me like a (small) bad smell... | |
892 | Why do you specify the `None` value here? (not the only place where you do that). I know 'explicit is better', but... | |
973 |
YAGNI! Do NEVER add uneeded complexity today for an hypothetical need tomorrow! | |
976 | All these `copy.deepcopy(dict)` makes me a bit sick. Not sure a better solution is easy to consider here, just a gratuitous complaint! | |
1009–1011 | Why on earth building this inserted list then yield from it? | |
swh/storage/tests/test_storage.py | ||
785–790 | These hunks (in test_storage.py) should not be in this Diff. Having to do so in order to make your test_in_memory pass (I guess it's the reason for these modifications) prove your implementation does not respect the "API" of the original storage class. It's probably no big deal here, but it's a bad practice... |
swh/storage/in_memory.py | ||
---|---|---|
32 |
I see. That seems outside the scope of this Diff, could you open a Task?
I did not implement methods that are not covered by tests. As you pointed out, it's not obvious which methods are public, so I used the assumption "tested == public". | |
34–47 | That other diff programmatically enforces the keys. It's better than informative comments that may not always be accurate. | |
58–62 | From myself, that's why I used an assertion. It came useful while writing the code and I figured there was no harm in keeping it. I agree it's unclear; removed. | |
84–87 | Agreed. |
swh/storage/in_memory.py | ||
---|---|---|
554–558 | I just re-read the code, and I really don't think return self.snapshot_get(self._origin_visits[visit]['snapshot']) is better. It just happens to work because snapshot_get behaves like this with non-bytes as argument, but this is not documented, there is no guarantee this will last forever. |
swh/storage/in_memory.py | ||
---|---|---|
892 | I only learned last week the second argument of dict.get is optional ^^ | |
976 | Wait for the next diff, I'll store everything in namedtuples with immutable values :) | |
1009–1011 | If the generator is not consumed, yield statements block, so nothing would be inserted. | |
swh/storage/tests/test_storage.py | ||
785–790 | comments like # hack: ids generated show that these IDs are quirks from the SQL backend, not features. | |
785–790 | Actually, according to Storage's docstrings, actual_result['author'] and actual_result['committer'] should not even exist. |
- Remove useless assertion.
- Remove irrelevant doc.
- Small fixes.
- Fix test.
- Remove _origin_visit_key.
- dict.get(foo, None) -> dict.get(foo)
- Remove _origin_metadata_key.
Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/82/ for more details.
Build is green
See https://jenkins.softwareheritage.org/job/DSTO/job/tox/83/ for more details.