Page MenuHomeSoftware Heritage

Add type annotations to indexer classes
ClosedPublic

Authored by krithikvaidya on Feb 4 2020, 3:31 PM.

Details

Summary

For annotating classes and methods in swh/indexer/indexer.py, swh/indexer/metadata.py, swh/indexer/mimetype.py, swh/indexer/origin_head.py and swh/indexer/rehash.py.

Diff Detail

Repository
rDCIDX Metadata indexer
Branch
typing-indexer
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 10432
Build 15531: tox-on-jenkinsJenkins
Build 15530: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Okay great! I'll add the type annotations to the other files too. Thanks

krithikvaidya retitled this revision from Add type annotations for mimetype.py to Add type annotations to indexer classes.EditedFeb 5 2020, 7:14 PM
krithikvaidya edited the summary of this revision. (Show Details)

In swh-indexer's indexer.py (code), the run() method of BaseIndexer class has been overriden by a few derived classes. However, I think some of these overridings may be flawed, since when I try to type-annotate them, they're giving the mypy errors: argument of method incompatible with supertype / Signature of "run" incompatible with supertype "BaseIndexer". This is probably because of a violation of the Liskov substitution principle.

Do you'll suggest that I just tell mypy to ignore these errors for now? I can't get mypy to ignore these errors in the method definition expression either. The expression is spread across multiple lines, and the ignore annotation works for these kinds of expressions only in Python 3.8+ (source). So, I could just remove type checking from these methods altogether, for now.

These are the errors I'm getting:

Not sure if adding a comment notifies the reviewers, so tagging @vlorentz @ardumont

They are errors, I don't think we should ignore them, but fix the underlying issue;

vlorentz requested changes to this revision.Feb 6 2020, 1:13 PM

(I'm marking this diff as "changes requested" to reflect that you're planning on making changes)

This revision now requires changes to proceed.Feb 6 2020, 1:13 PM

They are errors, I don't think we should ignore them, but fix the underlying issue;

Right... So I'll remove the type annotations from these methods for now. Once I'm done, should I create a separate Maniphest task describing the underlying issue?

Add type annotations for rehash.py, indexer.py, origin_head.py and metadata.py.

Overriden methods and their parent class methods have mostly been excluded from type annotating, for now.
A lot of the methods in origin_head.py require documentation and are incomplete/unused, so I've had to use a lot of Any's in typing those.

I've set the type annotation as Any if

  • I'm unsure of the type
  • The field has values of different types, like in the value field of some Dict's or in some elements of some Tuple's
  • The type of the variable changes in the program.

Apply minor, cosmetic changes

@vlorentz @ardumont
I think I'm done here, please check the code and my comments above.

swh/indexer/indexer.py
472 ↗(On Diff #9421)

Not sure which overriden index() is being called here

Thanks, looks good.

Remains some remarks.
And another one more general, for the docstring, either wrap all parameters with **parameter-name**: description or remove them altogether.
I see you kinda mix those.

swh/indexer/indexer.py
39 ↗(On Diff #9421)

Add the working directory docstring, it's where the file is written.
i also think it's a string (to update its type in the signature).

472 ↗(On Diff #9421)

yeah, it's fine for now [1]

The one that matters here is the main class BaseIndex's index definition which you typed.
It's highly possible the subclasses are not consistent on the type, only on the parameters so, fine.
everything is fine ;)

[1] indexer has grown a bit and after a while and getting back to it, it's a tad unclear.

486 ↗(On Diff #9421)

remove the **

swh/indexer/mimetype.py
137

str without the quote :)

And another one more general, for the docstring, either wrap all parameters with **parameter-name**: description or remove them altogether.

Done. I've left the single ** for the kwargs in the docstring though.

Once I'm done, should I create a separate Maniphest task describing the underlying issue?

Should I do this?

swh/indexer/indexer.py
39 ↗(On Diff #9421)

Added to the docstring.

Regarding the type of working_directory, I wasn't able to figure it out based on checking its use in this method, as well as from checking the methods that call this method through using grep. According to this, the type can either be either a string or bytes object representing a path. . So if we're sure it's a string, I'll change it to str 👍

472 ↗(On Diff #9421)

Ah I see

You should use a TypeVar for the type of results, so you don't have to use Any every time.

swh/indexer/indexer.py
33 ↗(On Diff #9421)

it yields strings

186 ↗(On Diff #9421)

annotation should preferably be on the class itself.

189 ↗(On Diff #9421)

It's a Dict

257 ↗(On Diff #9421)

it yields bytes

453 ↗(On Diff #9421)

it yields dicts

482 ↗(On Diff #9421)

yields bytes

vlorentz added inline comments.
swh/indexer/origin_head.py
67 ↗(On Diff #9421)
111 ↗(On Diff #9421)

not a bool

123 ↗(On Diff #9421)

You can use specific types

swh/indexer/rehash.py
86 ↗(On Diff #9421)

Generator

This revision now requires changes to proceed.Feb 7 2020, 11:08 AM

You should use a TypeVar for the type of results, so you don't have to use Any every time.

Sorry didn't get you. What results are you talking about?

swh/indexer/indexer.py
482 ↗(On Diff #9421)

I'm not sure that this is the case, since this method yields from _index_contents(), and _index_contents() yields Dicts.

Sorry didn't get you. What results are you talking about?

All variables named results

swh/indexer/indexer.py
482 ↗(On Diff #9421)

Then you should also fix the docstring

vlorentz requested changes to this revision.Feb 7 2020, 5:01 PM
vlorentz added inline comments.
swh/indexer/indexer.py
453 ↗(On Diff #9421)
482 ↗(On Diff #9421)

same comment on SentType

swh/indexer/metadata.py
29 ↗(On Diff #9442)

sorry I missed this earlier, but you also forgot yield type for this generator

This revision now requires changes to proceed.Feb 7 2020, 5:01 PM
swh/indexer/indexer.py
39 ↗(On Diff #9421)

Yes, it could be bytes also.
but i believe we only use it with string.

Looks good! Now you just need to squash your commits before I accept the diff

This revision now requires changes to proceed.Feb 7 2020, 11:17 PM
This revision is now accepted and ready to land.Feb 10 2020, 12:11 PM

Great! Thanks for the patience and review 😃 . Will work on another task after my exams are done 😴

They are errors, I don't think we should ignore them, but fix the underlying issue;

Right... So I'll remove the type annotations from these methods for now. Once I'm done, should I create a separate Maniphest task describing the underlying issue?

This issue still remains, though