Page MenuHomeSoftware Heritage

storage*: origin_visit_get(...) -> PagedResult[OriginVisit]
ClosedPublic

Authored by ardumont on Jul 27 2020, 10:10 PM.

Details

Summary

Possible improvments for the allowed order (asc, desc) which could be an enum exposed in interface.py (D3629)

Impacts only swh-deposit [1] and swh-web [2]:

grep -r 'origin_visit_get(' ../*/swh/** | grep -v "swh-storage"
../swh-deposit/swh/deposit/migrations/0018_migrate_swhids.py:    all_visits = storage.origin_visit_get(origin)
../swh-web/swh/web/common/service.py:    for visit in storage.origin_visit_get(
../swh-web/swh/web/tests/browse/views/test_origin.py:    visits = archive_data.origin_visit_get(origin["url"])
../swh-web/swh/web/tests/browse/views/test_origin.py:    origin_visits = archive_data.origin_visit_get(origin["url"])
../swh-web/swh/web/tests/browse/views/test_origin.py:    origin_visits = archive_data.origin_visit_get(origin["url"])
../swh-web/swh/web/tests/browse/views/test_origin.py:    origin_visits = archive_data.origin_visit_get(origin["url"])
../swh-web/swh/web/tests/browse/views/test_origin.py:    origin_visits = archive_data.origin_visit_get(origin["url"])
../swh-web/swh/web/tests/browse/views/test_origin.py:    origin_visits = archive_data.origin_visit_get(origin["url"])
../swh-web/swh/web/tests/browse/views/test_content.py:    visits = archive_data.origin_visit_get(origin["url"])
../swh-web/swh/web/tests/browse/views/test_content.py:    visits = archive_data.origin_visit_get(origin["url"])
../swh-web/swh/web/tests/browse/views/test_directory.py:    visits = archive_data.origin_visit_get(origin["url"])
../swh-web/swh/web/tests/browse/views/test_directory.py:    visits = archive_data.origin_visit_get(origin["url"])
../swh-web/swh/web/tests/browse/test_snapshot_context.py:    visits = archive_data.origin_visit_get(origin["url"])
../swh-web/swh/web/tests/common/test_identifiers.py:    visits = archive_data.origin_visit_get(origin["url"])
../swh-web/swh/web/tests/common/test_identifiers.py:    visits = archive_data.origin_visit_get(origin["url"])
../swh-web/swh/web/tests/common/test_service.py:    expected_visits = archive_data.origin_visit_get(new_origin.url)
../swh-web/swh/web/tests/common/test_service.py:    origin_visit = archive_data.origin_visit_get(origin["url"])[-1]
../swh-web/swh/web/tests/conftest.py:    def origin_visit_get(self, origin_url):
../swh-web/swh/web/tests/conftest.py:        visits = list(self.storage.origin_visit_get(origin_url))
../swh-web/swh/web/tests/strategies.py:        visits = list(tests_data["storage"].origin_visit_get(origin["url"]))

[1] D3645

[2] D3647

Related to T645

Test Plan

tox

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
swh/storage/api/serializers.py
34 ↗(On Diff #12774)

make it check __type__is "PagedResult"

50 ↗(On Diff #12774)

"storage" instead of "model_paged_result"

60 ↗(On Diff #12774)

same

Actually, now that I think of it... I think this PagedResult structure should go in swh.core.api. It would also be used by swh-search, and potentially by swh-index-storage.

This revision now requires changes to proceed.Jul 28 2020, 2:04 PM
vlorentz added inline comments.
swh/storage/api/serializers.py
50 ↗(On Diff #12774)

or "storage_class". I can't find a satisfying name.

Actually, now that I think of it... I think this PagedResult structure should go in swh.core.api. It would also be used by swh-search, and potentially by swh-index-storage.

Are we sure that this will be used in those other modules?

Pretty much, yes. swh-search already has pagination very close to swh-storage

swh/storage/api/serializers.py
50 ↗(On Diff #12774)

(I saw that when doing D3629 but waited on you to make a remark ;)

well that's storage_model really.
It's the original model one which may be too broad.

Pretty much, yes. swh-search already has pagination very close to swh-storage

Ok, moving PagedResult up a notch in core then.

Pretty much, yes. swh-search already has pagination very close to swh-storage

Ok, moving PagedResult up a notch in core then.

Wait, why not in swh.model again?

Because it's not part of the data model, it's just used for RPC

swh/storage/api/serializers.py
29 ↗(On Diff #12774)

turns out that can be simplified to

"results": obj.results,

the remaining part of the serialization will do the rest.

36 ↗(On Diff #12774)

turns out that can be simplified to

results=obj["results"]

Actually, now that I think of it... I think this PagedResult structure should go in swh.core.api. It would also be used by swh-search, and potentially by swh-index-storage.

See D3632 ;)

  • Refactor using D3632 code (PagedResult moved in core)

Note:

  • tox will fail until we land and release the new core
  • locally everything is fine

Build has FAILED

Patch application report for D3627 (id=12779)

Rebasing onto 77960ca4b5...

Current branch diff-target is up to date.
Changes applied before test
commit 3df99d4ea1c450bfb8bd6394b3182fb05d7521ca
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Jul 27 22:17:59 2020 +0200

    storage*: origin_visit_get(...) -> PagedResult[OriginVisit]
    
    Related to T645

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/583/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/583/console

Use swh.core PagedResult and alias a storage with token as str

heads up, tox mypy, tests (local) and me are happy with this. I just open a
specific type for the token to be a string to follow the current dev on this
diff.

pre-commit mypy is not ¯\_(ツ)_/¯ though [1]
I don't get the pre-commit mypy hints yet.

Note: this diff is now 2 commits for now as i'm not yet sure of the alias
things (thank you for that pre-commit mypy...)

[1] https://forge.softwareheritage.org/P731

Build has FAILED

Patch application report for D3627 (id=12792)

Rebasing onto 77960ca4b5...

Current branch diff-target is up to date.
Changes applied before test
commit 27618b59d766e3e5a65acdace6424b15549490af
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jul 29 07:41:21 2020 +0200

    Use swh.core PagedResult and alias a storage with token as str
    
    It makes sense to me but pre-commit mypy is unhappy with this and I don't get
    its hints.
    
    Related to T645

commit 3df99d4ea1c450bfb8bd6394b3182fb05d7521ca
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Jul 27 22:17:59 2020 +0200

    storage*: origin_visit_get(...) -> PagedResult[OriginVisit]
    
    Related to T645

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/585/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/585/console

You can remove all your changed from serializers.py now

swh/storage/interface.py
36

PagedResult is not an instance of CorePagedResult[TResult, str], it's CorePagedResult[TResult, str] itself. Therefore:

PagedResult: Type[CorePagedResult[TResult, str]] = CorePagedResult[TResult, str]
40–41

if you want this one to work, don't reuse the same TypeVar:

TResult = TypeVar("TResult")

class PagedResult(CorePagedResult[TResult, str]):
    pass
This revision now requires changes to proceed.Jul 29 2020, 9:15 AM

You can remove all your changed from serializers.py now

Yeah, i'm gonna add those for another diff.

swh/storage/interface.py
36

I tried that as well...

swh/storage/interface.py:36: error: Type variable "swh.core.api.classes.TResult" is unbound
swh/storage/interface.py:36: note: (Hint: Use "Generic[TResult]" or "Protocol[TResult]" base class to bind "TResult" inside a class)
swh/storage/interface.py:36: note: (Hint: Use "TResult" in function signature to bind "TResult" inside a function)
swh/storage/interface.py:810: error: Variable "swh.storage.interface.PagedResult" is not valid as a type
swh/storage/interface.py:810: note: See https://mypy.readthedocs.io/en/latest/common_issues.html#variables-vs-type-aliases
swh/storage/in_memory.py:869: error: Variable "swh.storage.interface.PagedResult" is not valid as a type
swh/storage/in_memory.py:869: note: See https://mypy.readthedocs.io/en/latest/common_issues.html#variables-vs-type-aliases
swh/storage/cassandra/storage.py:851: error: Variable "swh.storage.interface.PagedResult" is not valid as a type
swh/storage/cassandra/storage.py:851: note: See https://mypy.readthedocs.io/en/latest/common_issues.html#variables-vs-type-aliases
swh/storage/storage.py:890: error: Variable "swh.storage.interface.PagedResult" is not valid as a type
swh/storage/storage.py:890: note: See https://mypy.readthedocs.io/en/latest/common_issues.html#variables-vs-type-aliases
Found 5 errors in 4 files (checked 68 source files)
40–41

if you want this one to work, don't reuse the same TypeVar:

why?


I recall having tested that (without redefining TResult though) that would not work...
I'll try again, thanks.

swh/storage/interface.py
36

ah yes, that makes sense.

40–41

I'm guessing it makes mypy glitchy to use the same typevar in both a generic class in a subclass of that class

  • Adapt part of the review
  • tests are failing the comparison [1] for some unknown reason yet (i exploded the comparison for now)

(still need to cleanup the serialization part)

[1] P732

Build has FAILED

Patch application report for D3627 (id=12798)

Rebasing onto 77960ca4b5...

Current branch diff-target is up to date.
Changes applied before test
commit 28d930d4e6676a6040d53e47c74bec807418e6d2
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Jul 27 22:17:59 2020 +0200

    storage*: origin_visit_get(...) -> PagedResult[OriginVisit]
    
    Related to T645

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/588/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/588/console

Fix swh.storage.interface.PagedResult type so everyone is happy (all mypys,
tests, flake...)

Drop the serialization changes as it's irrelevant to the diff now

Build is green

Patch application report for D3627 (id=12802)

Rebasing onto 77960ca4b5...

Current branch diff-target is up to date.
Changes applied before test
commit 7593a0368a3fc7c66f4d872ff713cfb0942c116b
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Jul 27 22:17:59 2020 +0200

    storage*: origin_visit_get(...) -> PagedResult[OriginVisit]
    
    Related to T645

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/589/ for more details.

Build is green

Patch application report for D3627 (id=12803)

Rebasing onto 77960ca4b5...

Current branch diff-target is up to date.
Changes applied before test
commit 39ed4e85d0ffbf8fc78ce729a12b85ca5a108a47
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Jul 27 22:17:59 2020 +0200

    storage*: origin_visit_get(...) -> PagedResult[OriginVisit]
    
    Related to T645

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/590/ for more details.

mypy.ini
63 ↗(On Diff #12803)

Not needed anymore.
I'll drop it.

swh/storage/in_memory.py
893

??
visits = visits[:extra_limit] should be enough
??

vlorentz added inline comments.
swh/storage/tests/test_storage.py
1315

nope (should be opaque)

1329

same

1356

get the next page of results and test it

1358–1360

same

1364

assert actual_page.next_page_token is None

1377

you should get the next page of results and test it too

1611–1614

you forgot this comment

This revision now requires changes to proceed.Jul 29 2020, 12:25 PM
swh/storage/in_memory.py
893

yes. I think I commented something about it yesterday

swh/storage/in_memory.py
893

about is None, i thought I dropped it entirely but messed it up.

swh/storage/tests/test_storage.py
1315

well, yes but i also said that i wanted to keep the existing tests comparable with the existing ones...
That test is intended to cover the 8 cases of pagination.

If i'm starting to use the opaque identifier here, it will become unreadable...

vlorentz added inline comments.
swh/storage/tests/test_storage.py
1315

meh.

This revision is now accepted and ready to land.Jul 29 2020, 12:30 PM
ardumont marked an inline comment as done.
  • Fix pagination which was off in some cases
  • Adapt according some more remarks in review

Build has FAILED

Patch application report for D3627 (id=12807)

Rebasing onto 77960ca4b5...

Current branch diff-target is up to date.
Changes applied before test
commit c55ff7444b5f99b4e7163742d3e510c356f92d33
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Jul 27 22:17:59 2020 +0200

    storage*: origin_visit_get(...) -> PagedResult[OriginVisit]
    
    Related to T645

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/592/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/592/console

Build is green

Patch application report for D3627 (id=12809)

Rebasing onto 77960ca4b5...

Current branch diff-target is up to date.
Changes applied before test
commit fd55b818bf44f417595eb4f81aa659a6756a7510
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Jul 27 22:17:59 2020 +0200

    storage*: origin_visit_get(...) -> PagedResult[OriginVisit]
    
    Related to T645

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/594/ for more details.

swh/storage/tests/test_storage.py
1364

I thought it enough already.

Build is green

Patch application report for D3627 (id=12812)

Rebasing onto f543bd53ee...

Current branch diff-target is up to date.
Changes applied before test
commit 643ebc6e7eb9148558d1c692332785f529ffcb0e
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Mon Jul 27 22:17:59 2020 +0200

    storage*: origin_visit_get(...) -> PagedResult[OriginVisit]
    
    Related to T645

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/596/ for more details.

ardumont edited the summary of this revision. (Show Details)