For annotating classes and methods in swh/indexer/indexer.py, swh/indexer/metadata.py, swh/indexer/mimetype.py, swh/indexer/origin_head.py and swh/indexer/rehash.py.
Details
- Reviewers
vlorentz ardumont - Group Reviewers
Reviewers - Maniphest Tasks
- T2258: Add type annotations to indexer classes
- Commits
- rDCIDX5f49b59e6aa3: Add type annotations to indexer classes
- Required Signatures
L3 Software Heritage Contributor License Agreement, version 1.0
Diff Detail
- Repository
- rDCIDX Metadata indexer
- Branch
- typing-indexer
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 10432 Build 15531: tox-on-jenkins Jenkins Build 15530: arc lint + arc unit
Event Timeline
Build is green
See https://jenkins.softwareheritage.org/job/DCIDX/job/tox/677/ for more details.
In swh-indexer's indexer.py (code), the run() method of BaseIndexer class has been overriden by a few derived classes. However, I think some of these overridings may be flawed, since when I try to type-annotate them, they're giving the mypy errors: argument of method incompatible with supertype / Signature of "run" incompatible with supertype "BaseIndexer". This is probably because of a violation of the Liskov substitution principle.
Do you'll suggest that I just tell mypy to ignore these errors for now? I can't get mypy to ignore these errors in the method definition expression either. The expression is spread across multiple lines, and the ignore annotation works for these kinds of expressions only in Python 3.8+ (source). So, I could just remove type checking from these methods altogether, for now.
These are the errors I'm getting:
(I'm marking this diff as "changes requested" to reflect that you're planning on making changes)
Right... So I'll remove the type annotations from these methods for now. Once I'm done, should I create a separate Maniphest task describing the underlying issue?
Add type annotations for rehash.py, indexer.py, origin_head.py and metadata.py.
Overriden methods and their parent class methods have mostly been excluded from type annotating, for now.
A lot of the methods in origin_head.py require documentation and are incomplete/unused, so I've had to use a lot of Any's in typing those.
I've set the type annotation as Any if
- I'm unsure of the type
- The field has values of different types, like in the value field of some Dict's or in some elements of some Tuple's
- The type of the variable changes in the program.
Build is green
See https://jenkins.softwareheritage.org/job/DCIDX/job/tox/678/ for more details.
Build is green
See https://jenkins.softwareheritage.org/job/DCIDX/job/tox/679/ for more details.
Thanks, looks good.
Remains some remarks.
And another one more general, for the docstring, either wrap all parameters with **parameter-name**: description or remove them altogether.
I see you kinda mix those.
swh/indexer/indexer.py | ||
---|---|---|
39 ↗ | (On Diff #9421) | Add the working directory docstring, it's where the file is written. |
472 ↗ | (On Diff #9421) | yeah, it's fine for now [1] The one that matters here is the main class BaseIndex's index definition which you typed. [1] indexer has grown a bit and after a while and getting back to it, it's a tad unclear. |
486 ↗ | (On Diff #9421) | remove the ** |
swh/indexer/mimetype.py | ||
137 | str without the quote :) |
Done. I've left the single ** for the kwargs in the docstring though.
Should I do this?
swh/indexer/indexer.py | ||
---|---|---|
39 ↗ | (On Diff #9421) | Added to the docstring. Regarding the type of working_directory, I wasn't able to figure it out based on checking its use in this method, as well as from checking the methods that call this method through using grep. According to this, the type can either be either a string or bytes object representing a path. . So if we're sure it's a string, I'll change it to str 👍 |
472 ↗ | (On Diff #9421) | Ah I see |
You should use a TypeVar for the type of results, so you don't have to use Any every time.
swh/indexer/indexer.py | ||
---|---|---|
33 ↗ | (On Diff #9421) | it yields strings |
186 ↗ | (On Diff #9421) | annotation should preferably be on the class itself. |
189 ↗ | (On Diff #9421) | It's a Dict |
257 ↗ | (On Diff #9421) | it yields bytes |
453 ↗ | (On Diff #9421) | it yields dicts |
482 ↗ | (On Diff #9421) | yields bytes |
swh/indexer/origin_head.py | ||
---|---|---|
67 ↗ | (On Diff #9421) | can be improved: https://docs.python.org/3/library/typing.html#typing.Tuple |
111 ↗ | (On Diff #9421) | not a bool |
123 ↗ | (On Diff #9421) | You can use specific types |
swh/indexer/rehash.py | ||
86 ↗ | (On Diff #9421) | Generator |
Sorry didn't get you. What results are you talking about?
swh/indexer/indexer.py | ||
---|---|---|
482 ↗ | (On Diff #9421) | I'm not sure that this is the case, since this method yields from _index_contents(), and _index_contents() yields Dicts. |
All variables named results
swh/indexer/indexer.py | ||
---|---|---|
482 ↗ | (On Diff #9421) | Then you should also fix the docstring |
swh/indexer/indexer.py | ||
---|---|---|
453 ↗ | (On Diff #9421) | SendType should be None, not Any. https://docs.python.org/3/library/typing.html#typing.Generator |
482 ↗ | (On Diff #9421) | same comment on SentType |
swh/indexer/metadata.py | ||
29 ↗ | (On Diff #9442) | sorry I missed this earlier, but you also forgot yield type for this generator |
Build is green
See https://jenkins.softwareheritage.org/job/DCIDX/job/tox/680/ for more details.
Build is green
See https://jenkins.softwareheritage.org/job/DCIDX/job/tox/681/ for more details.
swh/indexer/indexer.py | ||
---|---|---|
39 ↗ | (On Diff #9421) | Yes, it could be bytes also. |
Build is green
See https://jenkins.softwareheritage.org/job/DCIDX/job/tox/682/ for more details.
Build is green
See https://jenkins.softwareheritage.org/job/DCIDX/job/tox/683/ for more details.
Build is green
See https://jenkins.softwareheritage.org/job/DCIDX/job/tox/684/ for more details.
Build is green
See https://jenkins.softwareheritage.org/job/DCIDX/job/tox/685/ for more details.
Great! Thanks for the patience and review 😃 . Will work on another task after my exams are done 😴
This issue still remains, though