- better names related to the object buffers
- extracted parameter dicts from the constructor
- used more generic typing in function parameters and more specific in
other contexts in order to apply the principle of robustness
Close T2287
Differential D4017
Improve code quality and doc in BufferedProxyStorage tenma on Sep 23 2020, 1:08 PM. Authored by
Details
other contexts in order to apply the principle of robustness Close T2287
Diff Detail
Event TimelineComment Actions Build is green Patch application report for D4017 (id=14169)Rebasing onto 30cdb78f17... Current branch diff-target is up to date. Changes applied before testcommit aa7ca077a4c516f3fa30e45b0cf880a519232743 Author: tenma <tenma+swh@mailbox.org> Date: Tue Sep 22 17:57:24 2020 +0200 Improve code quality and doc in BufferedProxyStorage - better names related to the object buffers - extracted parameter dicts from the constructor - used more generic typing in function parameters and more specific in other contexts in order to apply the principle of robustness See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/963/ for more details.
Comment Actions Build is green Patch application report for D4017 (id=14187)Rebasing onto 924621fff8... First, rewinding head to replay your work on top of it... Applying: Improve code quality and doc in BufferedProxyStorage Changes applied before testcommit 9d99ba09470f92205bdd3fc8cb13ea7103fdf6dc Author: tenma <tenma+swh@mailbox.org> Date: Tue Sep 22 17:57:24 2020 +0200 Improve code quality and doc in BufferedProxyStorage - better names related to the object buffers - extracted parameter dicts from the constructor - used more generic typing in function parameters and more specific in other contexts in order to apply the principle of robustness See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/965/ for more details. Comment Actions Build is green Patch application report for D4017 (id=14202)Rebasing onto e37f63930b... First, rewinding head to replay your work on top of it... Fast-forwarded diff-target to base-revision-968-D4017. Changes applied before testSee https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/968/ for more details. Comment Actions Thanks for these improvements, which look good in general! I only have a nit to pick on the min_batch_size default argument. We usually avoid using mutable structures (e.g. a dict) as default function arguments (because that's a footgun: if you mutate the argument for some reason, you can end up mutating the default value). The common pattern is to make the default value None, and to use (or copy) the default value there. Identity checking a mutable default argument is a very surprising construct (at least I don't think we're using it anywhere else), so I'd rather we avoid it altogether. By making the default value None, you end up killing two birds with one stone. Comment Actions I view it that way:
After this rationale, which could be a valid policy to introduce in the project, now the proposition is: Do I make it immutable with ImmutableDict?
Comment Actions Sorry, I missed your two comments before writing mine. It's just a footgun, and is not an idiomatic pattern in python
possibly yes, but we don't generally do it, again because it's not idiomatic in Python. We only use ImmutableDict in the model because that one is really important.
Somewhat. But if the constant has a name in allcaps, I think it's good enough to make sure we don't accidentally mutate it.
Again, is None is idiomatic in Python, is <other dict> isn't. And NoneType is not an "invalid type".
When we write the type is Optional[X], we explicitly mention the None case, and mypy will check it.
I'm not against it Comment Actions Doing an identity check on a mutable value is very often a bug. Plenty of static code checkers (pylint, pyflakes-bugbear) will throw a warning, with good reason.
Again, having a mutable default value is a common antipattern that /will/ throw a warning if we add more static code checkers to our tool belt.
This static type checking argument seems a bit specious to me. None is the idiomatic "default value" to pass to such a function taking a dict as an argument. What's really missing, however, is documenting the actual behavior of the argument: what happens when I pass a mapping with a missing key? Once that's documented, the intent behind the default None will be crystal clear.
*shrug* I really don't think that would be useful. Comment Actions I see that e37f63930b323a1e811861d298e62516ed94649a has been pushed. Meh. To properly push after a rebase, you need to do arc diff --update to make sure that the changes after rebase will be picked up by phabricator and the diff will be automatically closed. Comment Actions Yeah because it was validated by ardumont earlier, and I just made a little change then updated the commit. Comment Actions The fact that is not so common a pattern does not make it bad. The is None pattern is so common because it is the easiest way to work around the mutable argument problem which exists by design (I am not saying it is wrong, but that we can have better). Comment Actions Please let this go @tenma. Two people in the team have already argued against it. Thanks. Comment Actions Well, first, this code: def __init__( self, storage: Mapping, min_batch_size: Mapping = DEFAULT_BUFFER_THRESHOLDS, ): self.storage: StorageInterface = get_storage(**storage) if min_batch_size is not DEFAULT_BUFFER_THRESHOLDS: self._buffer_thresholds = {**DEFAULT_BUFFER_THRESHOLDS, **min_batch_size} is incorrect, because if the min_batch_size is not given, then self._buffer_thresholds is not set, which will cause a crash later. A correct version of this code is: def __init__( self, storage: Mapping, min_batch_size: Mapping = DEFAULT_BUFFER_THRESHOLDS, ): self.storage: StorageInterface = get_storage(**storage) if min_batch_size is not DEFAULT_BUFFER_THRESHOLDS: self._buffer_thresholds = {**DEFAULT_BUFFER_THRESHOLDS, **min_batch_size} else: self._buffer_thresholds = {**DEFAULT_BUFFER_THRESHOLDS} which is equivalent to: def __init__( self, storage: Mapping, min_batch_size: Mapping = {}, ): self.storage: StorageInterface = get_storage(**storage) self._buffer_thresholds = {**DEFAULT_BUFFER_THRESHOLDS, **min_batch_size} which solves everyone's concerns. Comment Actions And look! I can get rid of the identity check, just it would merge with itself. This one is to avoid perf penalty. Comment Actions Wanted also to rename constructor parameters but they are tied to the config parameters. Comment Actions ImmutableDict and None are both fine with me, as long as there's no identity check on the dict Comment Actions Last attempt: https://docs.python.org/3/reference/compound_stmts.html#function-definitions, paragraph 6 This is using features of Python, it is not a hack or something arcane/impl detail. Now if you definitely want to revert to None, I will. You may also want to say "just this time then". Comment Actions This diff should be closed (as the commit has landed on master already) and any further changes should happen on a further diff. I've given all my arguments already, I won't rehash them again. I'm fine with @vlorentz's second example code (with s/{}/None/, even though this mutable default is always used correctly; but I won't argue about it too hard.) I don't think ImmutableDict should leak anywhere else than in swh.model and related objects. |