Page MenuHomeSoftware Heritage

Properly define the type of index() methods of indexers.
ClosedPublic

Authored by vlorentz on Oct 7 2020, 5:51 PM.

Diff Detail

Repository
rDCIDX Metadata indexer
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D4189 (id=14745)

Could not rebase; Attempt merge onto 51b10e891d...

Updating 51b10e8..a0d47f1
Fast-forward
 swh/indexer/cli.py                           |    7 +-
 swh/indexer/ctags.py                         |   40 +-
 swh/indexer/fossology_license.py             |   31 +-
 swh/indexer/indexer.py                       |   85 +-
 swh/indexer/metadata.py                      |  164 ++--
 swh/indexer/metadata_dictionary/base.py      |   18 +-
 swh/indexer/mimetype.py                      |   35 +-
 swh/indexer/origin_head.py                   |   18 +-
 swh/indexer/storage/__init__.py              |  361 ++++---
 swh/indexer/storage/api/client.py            |    3 +
 swh/indexer/storage/api/serializers.py       |   26 +
 swh/indexer/storage/api/server.py            |    9 +-
 swh/indexer/storage/converters.py            |   15 +-
 swh/indexer/storage/db.py                    |   34 +-
 swh/indexer/storage/in_memory.py             |  222 ++---
 swh/indexer/storage/interface.py             |  237 ++---
 swh/indexer/storage/model.py                 |   12 +-
 swh/indexer/tests/storage/conftest.py        |   10 +-
 swh/indexer/tests/storage/test_converters.py |   17 +-
 swh/indexer/tests/storage/test_metrics.py    |    8 +-
 swh/indexer/tests/storage/test_server.py     |   14 +-
 swh/indexer/tests/storage/test_storage.py    | 1322 +++++++++++++-------------
 swh/indexer/tests/test_cli.py                |   36 +-
 swh/indexer/tests/test_ctags.py              |   23 +-
 swh/indexer/tests/test_fossology_license.py  |   22 +-
 swh/indexer/tests/test_indexer.py            |    7 +-
 swh/indexer/tests/test_metadata.py           |   69 +-
 swh/indexer/tests/test_mimetype.py           |   44 +-
 swh/indexer/tests/test_origin_metadata.py    |  185 ++--
 swh/indexer/tests/utils.py                   |   64 +-
 30 files changed, 1623 insertions(+), 1515 deletions(-)
 create mode 100644 swh/indexer/storage/api/serializers.py
Changes applied before test
commit a0d47f16eff9f393881e23e4458ba6001a40158f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 17:50:32 2020 +0200

    Properly define the type of index() methods of indexers.

commit 429d3a1a5e2d05a503482eec228d4e4527c84490
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 17:29:52 2020 +0200

    tests: Clean up compatibility code with endpoints that return dicts.
    
    They are all migrated to attr classes now.

commit 8c1b288b6298509080f6e536d33a87f8ae456276
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 17:21:04 2020 +0200

    use OriginIntrinsicMetadataRow in the storage interface instead of dicts.
    
    also add typing to some test functions

commit f954853a4cd8bfcc17fee17309db8aed4e75f7ad
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 16:01:15 2020 +0200

    use RevisionIntrinsicMetadataRow in the storage interface instead of dicts.
    
    also add typing to some test functions

commit a822ada3a012d1a384ac7950527a44344568cdf1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 15:10:04 2020 +0200

    use ContentMetadataRow in the storage interface instead of dicts.

commit 7566fec49a46abd03c2eb6abced7a7f786da6a4c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 13:39:49 2020 +0200

    use ContentCtagsRow in the storage interface instead of dicts.

commit af3c220b14e22ae270679511045f1f6563febe9b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 12:43:45 2020 +0200

    use ContentLanguageRow in the storage interface instead of dicts.

commit b4d084d675bb89f5d6803ddbf75bbe54c3e91c03
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 11:25:05 2020 +0200

    indexer.storage: Change return types from Iterable to List
    
    For consistency with the main storage.

commit 570816fd87a1451e560fce8881a0da5626cf0ab4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 11:06:15 2020 +0200

    license: use ContentLicenseRow in the storage interface instead of dicts.

commit d346feffffaac7e8cc5c3bc0eedbffc66db1d37e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 11:43:05 2020 +0200

    all indexers: make index() return a list of results instead of a single one.
    
    1. it was wrongfully annotated as '-> TResult' even though some indexers
       can return None
    2. in a future commit, the fossology indexer will need to return multiple
       results.

commit 7fe4a89dbcf221a0125c474640cbcdc8b01b1df2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 10:54:35 2020 +0200

    base indexers: add type annotation for self.{storage,idx_storage}.

commit 486ee085f5ee999da38793815ae462c00ece4efb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 10:48:44 2020 +0200

    indexer.storage: Update docstrings of mimetype-related endpoints.

commit 4ec112337909cc364bec119b2ef6ae047a81b96f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 10:46:07 2020 +0200

    indexer.storage: Change return type annotation from Iterator to Iterable.
    
    When going through the RPC, it's turned into a list.

commit e8e94cf237471636e84ceac5d1740cfa7a982f90
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 10:01:26 2020 +0200

    tests: Enable type-checking on storage test functions.
    
    By adding a simple type annotation to the test functions' signature.

commit 44cee8f213f26e01aa63519a8c21962b2fca460b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Oct 6 15:42:54 2020 +0200

    Make base indexers generic, with the result of index() as their type parameter.
    
    So the type of results can be statically checked, instead of needing to
    assert it to please mypy.

commit c3caf300830030f0cd1b3002d715e430582db8f1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Oct 6 15:35:39 2020 +0200

    mimetype: use ContentMimetypeRow in the storage interface instead of dicts.
    
    This temporarily adds mess in the generic tests to support both rows and dicts,
    but I'll remove it once I migrated all endpoints.

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/85/ for more details.

ardumont added a subscriber: ardumont.

I'm missing a test...

One question inline just to make sure I understood this right.

Otherwise, looks pretty neat (under the assumption i understood it ;)

swh/indexer/indexer.py
567

List[Sha1Git]

swh/indexer/metadata.py
53

For my understanding, so now the generic TId, TData and TResult are now formalized into respectively:

  • TId: Sha1
  • TData: bytes
  • TResult: ContentMetadataRow

TId and TData because of ContentIndexer
TResult because of this declaration.

Did i get this right?

.oO(That's neat)

swh/indexer/origin_head.py
45

Please, add a test?

This revision is now accepted and ready to land.Oct 8 2020, 9:42 AM
vlorentz added inline comments.
swh/indexer/metadata.py
53

yes

swh/indexer/origin_head.py
45

actually, I'll make it an assert, because origin_get_latest_visit_status(require_snapshot=True) can't return a visit_status with a none snapshot.

vlorentz marked 2 inline comments as done.

rebase + apply comments

Build is green

Patch application report for D4189 (id=14759)

Could not rebase; Attempt merge onto 51b10e891d...

Updating 51b10e8..c6afd9d
Fast-forward
 swh/indexer/cli.py                           |    7 +-
 swh/indexer/ctags.py                         |   40 +-
 swh/indexer/fossology_license.py             |   31 +-
 swh/indexer/indexer.py                       |   84 +-
 swh/indexer/metadata.py                      |  167 ++--
 swh/indexer/metadata_dictionary/base.py      |   18 +-
 swh/indexer/mimetype.py                      |   35 +-
 swh/indexer/origin_head.py                   |   17 +-
 swh/indexer/storage/__init__.py              |  361 ++++---
 swh/indexer/storage/api/client.py            |    3 +
 swh/indexer/storage/api/serializers.py       |   26 +
 swh/indexer/storage/api/server.py            |    9 +-
 swh/indexer/storage/converters.py            |   15 +-
 swh/indexer/storage/db.py                    |   34 +-
 swh/indexer/storage/in_memory.py             |  222 ++---
 swh/indexer/storage/interface.py             |  237 ++---
 swh/indexer/storage/model.py                 |   12 +-
 swh/indexer/tests/storage/conftest.py        |   10 +-
 swh/indexer/tests/storage/test_converters.py |   17 +-
 swh/indexer/tests/storage/test_metrics.py    |    8 +-
 swh/indexer/tests/storage/test_server.py     |   14 +-
 swh/indexer/tests/storage/test_storage.py    | 1322 +++++++++++++-------------
 swh/indexer/tests/test_cli.py                |   36 +-
 swh/indexer/tests/test_ctags.py              |   23 +-
 swh/indexer/tests/test_fossology_license.py  |   22 +-
 swh/indexer/tests/test_indexer.py            |    7 +-
 swh/indexer/tests/test_metadata.py           |   69 +-
 swh/indexer/tests/test_mimetype.py           |   44 +-
 swh/indexer/tests/test_origin_metadata.py    |  185 ++--
 swh/indexer/tests/utils.py                   |   64 +-
 30 files changed, 1623 insertions(+), 1516 deletions(-)
 create mode 100644 swh/indexer/storage/api/serializers.py
Changes applied before test
commit c6afd9d798a46039de910e0c968e75765274748f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 17:50:32 2020 +0200

    Properly define the type of index() methods of indexers.

commit 4a7292a8a816a994da28ab92265e264fa3469968
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 17:29:52 2020 +0200

    tests: Clean up compatibility code with endpoints that return dicts.
    
    They are all migrated to attr classes now.

commit 8c1b288b6298509080f6e536d33a87f8ae456276
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 17:21:04 2020 +0200

    use OriginIntrinsicMetadataRow in the storage interface instead of dicts.
    
    also add typing to some test functions

commit f954853a4cd8bfcc17fee17309db8aed4e75f7ad
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 16:01:15 2020 +0200

    use RevisionIntrinsicMetadataRow in the storage interface instead of dicts.
    
    also add typing to some test functions

commit a822ada3a012d1a384ac7950527a44344568cdf1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 15:10:04 2020 +0200

    use ContentMetadataRow in the storage interface instead of dicts.

commit 7566fec49a46abd03c2eb6abced7a7f786da6a4c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 13:39:49 2020 +0200

    use ContentCtagsRow in the storage interface instead of dicts.

commit af3c220b14e22ae270679511045f1f6563febe9b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 12:43:45 2020 +0200

    use ContentLanguageRow in the storage interface instead of dicts.

commit b4d084d675bb89f5d6803ddbf75bbe54c3e91c03
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 11:25:05 2020 +0200

    indexer.storage: Change return types from Iterable to List
    
    For consistency with the main storage.

commit 570816fd87a1451e560fce8881a0da5626cf0ab4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 11:06:15 2020 +0200

    license: use ContentLicenseRow in the storage interface instead of dicts.

commit d346feffffaac7e8cc5c3bc0eedbffc66db1d37e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 11:43:05 2020 +0200

    all indexers: make index() return a list of results instead of a single one.
    
    1. it was wrongfully annotated as '-> TResult' even though some indexers
       can return None
    2. in a future commit, the fossology indexer will need to return multiple
       results.

commit 7fe4a89dbcf221a0125c474640cbcdc8b01b1df2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 10:54:35 2020 +0200

    base indexers: add type annotation for self.{storage,idx_storage}.

commit 486ee085f5ee999da38793815ae462c00ece4efb
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 10:48:44 2020 +0200

    indexer.storage: Update docstrings of mimetype-related endpoints.

commit 4ec112337909cc364bec119b2ef6ae047a81b96f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 10:46:07 2020 +0200

    indexer.storage: Change return type annotation from Iterator to Iterable.
    
    When going through the RPC, it's turned into a list.

commit e8e94cf237471636e84ceac5d1740cfa7a982f90
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Oct 7 10:01:26 2020 +0200

    tests: Enable type-checking on storage test functions.
    
    By adding a simple type annotation to the test functions' signature.

commit 44cee8f213f26e01aa63519a8c21962b2fca460b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Oct 6 15:42:54 2020 +0200

    Make base indexers generic, with the result of index() as their type parameter.
    
    So the type of results can be statically checked, instead of needing to
    assert it to please mypy.

commit c3caf300830030f0cd1b3002d715e430582db8f1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Oct 6 15:35:39 2020 +0200

    mimetype: use ContentMimetypeRow in the storage interface instead of dicts.
    
    This temporarily adds mess in the generic tests to support both rows and dicts,
    but I'll remove it once I migrated all endpoints.

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/87/ for more details.