Page MenuHomeSoftware Heritage

Add type annotations to indexer classes
ClosedPublic

Authored by krithikvaidya on Tue, Feb 4, 3:31 PM.

Details

Summary

For annotating classes and methods in swh/indexer/indexer.py, swh/indexer/metadata.py, swh/indexer/mimetype.py, swh/indexer/origin_head.py and swh/indexer/rehash.py.

Diff Detail

Repository
rDCIDX Metadata indexer
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
krithikvaidya added a comment.EditedTue, Feb 4, 4:20 PM

Okay great! I'll add the type annotations to the other files too. Thanks

krithikvaidya retitled this revision from Add type annotations for mimetype.py to Add type annotations to indexer classes.EditedWed, Feb 5, 7:14 PM
krithikvaidya edited the summary of this revision. (Show Details)

In swh-indexer's indexer.py (code), the run() method of BaseIndexer class has been overriden by a few derived classes. However, I think some of these overridings may be flawed, since when I try to type-annotate them, they're giving the mypy errors: argument of method incompatible with supertype / Signature of "run" incompatible with supertype "BaseIndexer". This is probably because of a violation of the Liskov substitution principle.

Do you'll suggest that I just tell mypy to ignore these errors for now? I can't get mypy to ignore these errors in the method definition expression either. The expression is spread across multiple lines, and the ignore annotation works for these kinds of expressions only in Python 3.8+ (source). So, I could just remove type checking from these methods altogether, for now.

These are the errors I'm getting:

Not sure if adding a comment notifies the reviewers, so tagging @vlorentz @ardumont

They are errors, I don't think we should ignore them, but fix the underlying issue;

vlorentz requested changes to this revision.Thu, Feb 6, 1:13 PM

(I'm marking this diff as "changes requested" to reflect that you're planning on making changes)

This revision now requires changes to proceed.Thu, Feb 6, 1:13 PM
krithikvaidya planned changes to this revision.Thu, Feb 6, 1:19 PM

They are errors, I don't think we should ignore them, but fix the underlying issue;

Right... So I'll remove the type annotations from these methods for now. Once I'm done, should I create a separate Maniphest task describing the underlying issue?

krithikvaidya updated this revision to Diff 9420.EditedThu, Feb 6, 5:25 PM

Add type annotations for rehash.py, indexer.py, origin_head.py and metadata.py.

Overriden methods and their parent class methods have mostly been excluded from type annotating, for now.
A lot of the methods in origin_head.py require documentation and are incomplete/unused, so I've had to use a lot of Any's in typing those.

I've set the type annotation as Any if

  • I'm unsure of the type
  • The field has values of different types, like in the value field of some Dict's or in some elements of some Tuple's
  • The type of the variable changes in the program.

Apply minor, cosmetic changes

@vlorentz @ardumont
I think I'm done here, please check the code and my comments above.

swh/indexer/indexer.py
476

Not sure which overriden index() is being called here

ardumont accepted this revision.Fri, Feb 7, 3:13 AM

Thanks, looks good.

Remains some remarks.
And another one more general, for the docstring, either wrap all parameters with **parameter-name**: description or remove them altogether.
I see you kinda mix those.

swh/indexer/indexer.py
39–41

Add the working directory docstring, it's where the file is written.
i also think it's a string (to update its type in the signature).

476

yeah, it's fine for now [1]

The one that matters here is the main class BaseIndex's index definition which you typed.
It's highly possible the subclasses are not consistent on the type, only on the parameters so, fine.
everything is fine ;)

[1] indexer has grown a bit and after a while and getting back to it, it's a tad unclear.

490–491

remove the **

swh/indexer/mimetype.py
136

str without the quote :)

And another one more general, for the docstring, either wrap all parameters with **parameter-name**: description or remove them altogether.

Done. I've left the single ** for the kwargs in the docstring though.

Once I'm done, should I create a separate Maniphest task describing the underlying issue?

Should I do this?

swh/indexer/indexer.py
39–41

Added to the docstring.

Regarding the type of working_directory, I wasn't able to figure it out based on checking its use in this method, as well as from checking the methods that call this method through using grep. According to this, the type can either be either a string or bytes object representing a path. . So if we're sure it's a string, I'll change it to str 👍

476

Ah I see

krithikvaidya marked 6 inline comments as done.Fri, Feb 7, 4:14 AM

You should use a TypeVar for the type of results, so you don't have to use Any every time.

swh/indexer/indexer.py
33

it yields strings

190

annotation should preferably be on the class itself.

193

It's a Dict

261

it yields bytes

457

it yields dicts

486

yields bytes

vlorentz requested changes to this revision.Fri, Feb 7, 11:08 AM
vlorentz added inline comments.
swh/indexer/origin_head.py
67
111

not a bool

123

You can use specific types

swh/indexer/rehash.py
86

Generator

This revision now requires changes to proceed.Fri, Feb 7, 11:08 AM
krithikvaidya marked 9 inline comments as done.Fri, Feb 7, 1:46 PM

You should use a TypeVar for the type of results, so you don't have to use Any every time.

Sorry didn't get you. What results are you talking about?

swh/indexer/indexer.py
486

I'm not sure that this is the case, since this method yields from _index_contents(), and _index_contents() yields Dicts.

Sorry didn't get you. What results are you talking about?

All variables named results

swh/indexer/indexer.py
486

Then you should also fix the docstring

Requested changes made

krithikvaidya marked 3 inline comments as done.Fri, Feb 7, 4:59 PM
vlorentz requested changes to this revision.Fri, Feb 7, 5:01 PM
vlorentz added inline comments.
swh/indexer/indexer.py
457
486

same comment on SentType

swh/indexer/metadata.py
29

sorry I missed this earlier, but you also forgot yield type for this generator

This revision now requires changes to proceed.Fri, Feb 7, 5:01 PM

Fix minor issues

krithikvaidya marked 3 inline comments as done.Fri, Feb 7, 6:31 PM
ardumont added inline comments.Fri, Feb 7, 6:50 PM
swh/indexer/indexer.py
39–41

Yes, it could be bytes also.
but i believe we only use it with string.

Change type from Any to str

krithikvaidya marked an inline comment as done.Fri, Feb 7, 6:56 PM
vlorentz requested changes to this revision.Fri, Feb 7, 11:17 PM

Looks good! Now you just need to squash your commits before I accept the diff

This revision now requires changes to proceed.Fri, Feb 7, 11:17 PM

Squashed commits

Update commit message

vlorentz accepted this revision.Mon, Feb 10, 12:11 PM
This revision is now accepted and ready to land.Mon, Feb 10, 12:11 PM

Rebase branch on master

This revision was automatically updated to reflect the committed changes.

Great! Thanks for the patience and review 😃 . Will work on another task after my exams are done 😴

They are errors, I don't think we should ignore them, but fix the underlying issue;

Right... So I'll remove the type annotations from these methods for now. Once I'm done, should I create a separate Maniphest task describing the underlying issue?

This issue still remains, though