Page MenuHomeSoftware Heritage

storage*: Type content_find(...) -> List[Content]
ClosedPublic

Authored by ardumont on Aug 4 2020, 10:11 AM.

Details

Summary

Note that it drops unreachable paths [1] in internal storages use (cassandra, in-memory) as well

Impacts swh-web [2]:

$ grep -r "content_find(" $SWH_ENVIRONMENT_HOME/**/swh/* | grep -v swh-storage
/home/tony/work/inria/repo/swh/swh-environment/swh-web/swh/web/common/service.py:    found = _first_element(storage.content_find({algo: hash}))
/home/tony/work/inria/repo/swh/swh-environment/swh-web/swh/web/common/service.py:    found = _first_element(storage.content_find({algo: hash}))
/home/tony/work/inria/repo/swh/swh-environment/swh-web/swh/web/common/service.py:        hashes = _first_element(storage.content_find({algo: hash}))
/home/tony/work/inria/repo/swh/swh-environment/swh-web/swh/web/common/service.py:        content = _first_element(storage.content_find({"sha1_git": entity["target"]}))
/home/tony/work/inria/repo/swh/swh-environment/swh-web/swh/web/common/service.py:    c = _first_element(storage.content_find({algo: hash}))
/home/tony/work/inria/repo/swh/swh-environment/swh-web/swh/web/tests/common/test_service.py:    expected = archive_data.content_find(content)
/home/tony/work/inria/repo/swh/swh-environment/swh-web/swh/web/tests/conftest.py:    def content_find(self, content):
/home/tony/work/inria/repo/swh/swh-environment/swh-web/swh/web/tests/conftest.py:        cnt = self.storage.content_find(cnt_ids_bytes)

[1] for the table content, column status, the possible statuses are an enum
whose values are ('absent', 'visible', 'hidden') (in the production storage),
'missing' is not part of those values.

[2] D3693

Related to T645

Test Plan

tox

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ardumont created this revision.Aug 4 2020, 10:11 AM
ardumont added inline comments.Aug 4 2020, 10:14 AM
swh/storage/cassandra/storage.py
254

impossible path mentioned in the diff description.

swh/storage/db.py
336

Not necessary change but the format was off, there was no type...
So might as well align it with other db implementations.

swh/storage/in_memory.py
328

same alignment with f-strings we starting using.

353

impossible path mentioned in description.

Build is green

Patch application report for D3692 (id=13007)

Rebasing onto 3c2e5a3d7d...

Current branch diff-target is up to date.
Changes applied before test
commit 31f4df17446c03f6119f337d626d959a59c67b0e
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Aug 4 10:06:50 2020 +0200

    storage*: Type content_find(...) -> Iterable[Content]
    
    Related to T645

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/657/ for more details.

ardumont edited the summary of this revision. (Show Details)Aug 4 2020, 10:21 AM
ardumont added inline comments.Aug 4 2020, 10:52 AM
swh/storage/in_memory.py
435

Typoed, uncaught because we don't have coverage here, nice!

getattr(content, key).

And why did not mypy complain about swh/storage/in_memory.py:431: error: Value of type "Iterable[Content]" is not indexable earlier?

ardumont added inline comments.Aug 4 2020, 10:53 AM
swh/storage/in_memory.py
435

And why did not mypy complain ...

I gather mypy's check heuristic is basically skipping untyped methods (or something).
I have the error now that i tried to type this currently untyped method.

ardumont updated this revision to Diff 13008.Aug 4 2020, 11:02 AM
  • Add more types on impacted internal methods
  • Fix one typo in one of those
ardumont retitled this revision from storage*: Type content_find(...) -> Iterable[Content] to storage*: Type content_find(...) -> Sequence[Content].Aug 4 2020, 11:05 AM
ardumont updated this revision to Diff 13009.Aug 4 2020, 11:05 AM
ardumont retitled this revision from storage*: Type content_find(...) -> Sequence[Content] to storage*: Type content_find(...) -> Iterable[Content].
  • Fix black concat format
  • Rework commit message

Build is green

Patch application report for D3692 (id=13008)

Rebasing onto 3c2e5a3d7d...

Current branch diff-target is up to date.
Changes applied before test
commit 458d22f6cfd9d5348df0b4dd0993c7074b3d8375
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Aug 4 10:06:50 2020 +0200

    storage*: Type content_find(...) -> Iterable[Content]
    
    Related to T645

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/658/ for more details.

Build is green

Patch application report for D3692 (id=13009)

Rebasing onto 3c2e5a3d7d...

Current branch diff-target is up to date.
Changes applied before test
commit a516d43c6750724d8fd6dc291bc44f9f10fa8f62
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Aug 4 10:06:50 2020 +0200

    storage*: Type content_find(...) -> Sequence[Content]
    
    Related to T645

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/659/ for more details.

anlambert added inline comments.
swh/storage/cassandra/storage.py
218

I think the return type should be List[Content] here.

Precise types should be used for return values in the contrary of input ones.

ardumont added inline comments.Aug 4 2020, 11:20 AM
swh/storage/cassandra/storage.py
218

Oh yeah, that's right.
Thanks!

ardumont edited the summary of this revision. (Show Details)Aug 4 2020, 11:22 AM
anlambert added inline comments.Aug 4 2020, 11:23 AM
swh/storage/cassandra/storage.py
218

Iterable[Content] is even better ;-)

ardumont updated this revision to Diff 13011.Aug 4 2020, 11:25 AM
  • Fix returned type (which must be specific)
  • Rework commit message to reflect the change

Thanks @anlambert for reminding me (I kept on forgetting that ¯\_(ツ)_/¯ ;)

Hopefully, at some point, it will sink in "Be conservative in what you send, be
liberal in what you accept"

ardumont added inline comments.Aug 4 2020, 11:27 AM
swh/storage/cassandra/storage.py
218

That's what i initially did [1] but that does not work with how we use it...

contents = self.content_find(...)
if contents:
    content = contents[0]  # <- mypy says no,

[1] https://forge.softwareheritage.org/D3692#inline-25510

anlambert accepted this revision.Aug 4 2020, 11:29 AM
anlambert added inline comments.
swh/storage/cassandra/storage.py
218

Oh right, iterables are not indexable. Looks good to me then.

This revision is now accepted and ready to land.Aug 4 2020, 11:29 AM

Build is green

Patch application report for D3692 (id=13011)

Rebasing onto 3c2e5a3d7d...

Current branch diff-target is up to date.
Changes applied before test
commit 15e8c996d44108a6557973724f7b25940be85d4f
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Aug 4 10:06:50 2020 +0200

    storage*: Type content_find(...) -> List[Content]
    
    Related to T645

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/660/ for more details.

ardumont added inline comments.Aug 4 2020, 11:30 AM
swh/storage/in_memory.py
435

(webapp tests caught it though)

ardumont added inline comments.Aug 4 2020, 11:39 AM
swh/storage/in_memory.py
435

yeah, ok, the tests testing the directory-ls function only create the directories and not the contents targetted by the directories so we never pass here.

I'll fix it in another diff.

ardumont added inline comments.Aug 4 2020, 1:08 PM
swh/storage/in_memory.py
435
ardumont retitled this revision from storage*: Type content_find(...) -> Iterable[Content] to storage*: Type content_find(...) -> List[Content].Aug 4 2020, 1:17 PM
ardumont edited the summary of this revision. (Show Details)
ardumont added inline comments.Aug 4 2020, 1:23 PM
swh/storage/cassandra/storage.py
354

D3694 fixes coverage.

swh/storage/in_memory.py
431
This revision was automatically updated to reflect the committed changes.