Page MenuHomeSoftware Heritage

Add type annotations to metadata mappings
Needs RevisionPublic

Authored by rohan-sachan on Mar 21 2021, 1:41 PM.

Details

Summary

Added type annotations to all files in metadata_dictionary module of swh-indexer/swh/indexer.
Some type annotations can still be improved to a more concrete type, review expected for which.

Diff Detail

Repository
rDCIDX Metadata indexer
Branch
feature-type-annotations
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 20036
Build 31106: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 31105: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D5295 (id=18958)

Rebasing onto 8e046c1900...

First, rewinding head to replay your work on top of it...
Applying: Added type annotations to metadata mappings Closes T2259
Changes applied before test
commit d32e60225f5b41a0b8b9308b6eee6d0b58a97645
Author: Rohan Sachan <rohan.1.sachan@gmail.com>
Date:   Sun Mar 21 17:45:57 2021 +0530

    Added type annotations to metadata mappings Closes T2259

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/162/ for more details.

rohan-sachan retitled this revision from Added type annotations to metadata mappings Closes T2259 to Add type annotations to metadata mappings Closes T2259.Mar 21 2021, 2:32 PM

Hi, thanks for the diff.

I stopped reviewing because I noticed the same recurring issues. Could you address the comments below, and other occurrences of the issues I commented?

Thanks

swh/indexer/metadata_dictionary/__init__.py
23–24

why # type: ignore?

swh/indexer/metadata_dictionary/base.py
35

full annotation, please

63

Can you do better than Any?

65

and why this one?

91

ditto

97

Full annotation if possible

This revision now requires changes to proceed.Mar 21 2021, 2:53 PM

Thanks for a prompt review.
I'll go level deeper into annotating wherever I have used Any, and wherever it's possible to go a level deeper. Will submit an updated diff by tomorrow.

swh/indexer/metadata_dictionary/__init__.py
23–24

Getting the following error. It's because the BaseMapping base class in base.py doesn't have supperted_terms() fucntion. Although right now none of the metadata indexers are implementing BaseMapping class directly. One resolution is to just ignore the [attr-defined] and [arg-type] errors on line 23 and 24. If there's another way round, then please let me know.

__init__.py:23: error: "Type[BaseMapping]" has no attribute "supported_terms"  [attr-defined]
__init__.py:24: error: Argument 1 to "add" of "set" has incompatible type "Type[BaseMapping]"; expected Module  [arg-type]
swh/indexer/metadata_dictionary/base.py
63

I could go deeper into annotations and put up Union[bytes, str, int], but Union will make the implementation of isinstance() fucntions complicated.

65

Getting a ' ... no attribute "lower()" ' error with mypy without using #type: ignore. Will remove it in next revision by using isinstance() check instead.

91

Getting a ' ... no attribute "items()" ' error with mypy without using #type: ignore. Will remove it in next revision by using isinstance() check instead.

swh/indexer/metadata_dictionary/npm.py
141

If it's Ok, I'll use the following form of notaion?

Union[Dict[str, str], Any]

Just using Dict[str, str] gives error for having no return statement. Whereas putting up an extra else for returning None would be just unnecessary lines of code.

While you are at it, and as a minor point, please also double check your commit message, it doesn't match our conventions (e.g., it is in passive voice, while it shouldn't).

See our git style guide.

swh/indexer/metadata_dictionary/__init__.py
23–24

The second error is legitimate, values in MAPPINGS are not modules.

swh/indexer/metadata_dictionary/base.py
63

Look into TypedDict

swh/indexer/metadata_dictionary/npm.py
141

Union[Dict[str, str], Any] means the same thing as Any.

But you don't need to mak an union of Any here, there are only two possible types returned

Change type annotations to more concrete type
Change Type annotations for all files in swh.indexer.metadata_mapping to reflect more concrete type annotations.
Remove unncesseary usage of # type: ignore

Build is green

Patch application report for D5295 (id=19079)

Could not rebase; Attempt merge onto 8e046c1900...

Updating 8e046c1..70096a3
Fast-forward
 swh/indexer/metadata_dictionary/__init__.py | 10 +++--
 swh/indexer/metadata_dictionary/base.py     | 70 +++++++++++++++++++----------
 swh/indexer/metadata_dictionary/codemeta.py |  9 +++-
 swh/indexer/metadata_dictionary/maven.py    | 28 +++++++-----
 swh/indexer/metadata_dictionary/npm.py      | 21 ++++++---
 swh/indexer/metadata_dictionary/python.py   | 17 ++++---
 swh/indexer/metadata_dictionary/ruby.py     | 47 ++++++++++++-------
 7 files changed, 130 insertions(+), 72 deletions(-)
Changes applied before test
commit 70096a3281a8d278948e69f70afd9b22bd499f27
Author: Rohan Sachan <rohan.1.sachan@gmail.com>
Date:   Wed Mar 24 20:20:59 2021 +0530

    indexer.metadata_dictionary: Change type annotations to more concrete type

commit 7c0545b8a95896e18d766f7c470481957feda4ad
Author: Rohan Sachan <rohan.1.sachan@gmail.com>
Date:   Sun Mar 21 17:45:57 2021 +0530

    Added type annotations to metadata mappings Closes T2259

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/165/ for more details.

@rohan-sachan Please drop the suffix "Closes T2259" from the diff title (and probably
from the commit message).

You can add that suffix inside the diff description with the effect of closing the diff
when you will "land" it (git push once accepted)

rohan-sachan retitled this revision from Add type annotations to metadata mappings Closes T2259 to Add type annotations to metadata mappings.Mar 24 2021, 4:18 PM
rohan-sachan edited the summary of this revision. (Show Details)

@ardumont : Ok, removed the suffix "Closes T2259" from DIff title and summary.

swh/indexer/metadata_dictionary/__init__.py
23–24

Instead of using # type:ignore [attr-defined], you should add supported_terms() to the definition of BaseMapping. That way, mypy can actually check types are ok.

swh/indexer/metadata_dictionary/base.py
12

should be in CamelCase, see https://www.python.org/dev/peps/pep-0008/#class-names

And should be singular, because it represents a single entry.

65

No, don't use isinstance. They are supposed to be bytes.

91

This is not an improvement over type: ignore.

With type: ignore, you disable type checking with no runtime change.

When adding if isinstance(cls.mapping, Dict):, you also go around type checking, and add a runtime behavior that is not tested -- and also that should never happen. (and it's also not tested)

To phrase it differently: we use mypy to make sure we never get an expected type, but instead your changes replace type error exceptions with silent fallbacks.

This revision now requires changes to proceed.Mar 25 2021, 10:00 AM

Ok, I'll give another try at rectifying these issues throughout all the files.
Just a quick question, did you review the changes for all the files, or did you review the changes only up to base.py and didn't review further because of the repeated issues?