Page MenuHomeSoftware Heritage

Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.
ClosedPublic

Authored by vlorentz on Jul 8 2020, 10:13 AM.

Diff Detail

Repository
rDSTO Storage manager
Branch
model-metadata
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 13760
Build 21085: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 21084: arc lint + arc unit

Unit TestsFailed

TimeTest
7 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_pytest_plugin::test_sample_data
sample_data = {'authority': [MetadataAuthority(type=<MetadataAuthorityType.DEPOSIT: 'deposit'>, url='http://hal.inria.example.com/',...45678901234567890', 'type': 'dir'}), 'id': b'4\xf35\xa7P\x11\x1c\xa0\xa8\xb6M\x804\xfa\xec\x9e\xed\xc3\x96\xbe'}], ...} sample_data_model = {'content': [Content(sha1=b'4\x972t\xcc\xefj\xb4\xdf\xaa\xf8e\x99y/\xa9\xc3\xfeF\x89', sha1_git=b'\xd8\x1c\xc0q\x0e\xb...M\x804\xfa\xec\x9e\xed\xc3\x96\xbe')], 'origin': [Origin(url='file:///dev/null'), Origin(url='file:///dev/zero')], ...}
6 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.fixer::swh.storage.fixer._fix_content
2 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.fixer::swh.storage.fixer._fix_origin
4 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.fixer::swh.storage.fixer._fix_origin_visit
4 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.fixer::swh.storage.fixer._fix_revision
View Full Test Results (1 Failed · 794 Passed · 17 Skipped)

Event Timeline

Build has FAILED

Patch application report for D3456 (id=12233)

Rebasing onto e45ca76db4...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Start using swh-model for ext metadata.
Applying: [WIP] Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.
Changes applied before test
commit 1c2b98589bfdf231ea20c27800b38a6db7495da9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 10:13:13 2020 +0200

    [WIP] Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.

commit bc4a8ef0cc58787cd092cae65288ecfbfb39e36e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jul 7 13:07:33 2020 +0200

    [WIP] Start using swh-model for ext metadata.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/421/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/421/console

  • retry proxy
  • More consistent use of MetadataAuthority/MetadataFetcher.
  • pg storage

Build has FAILED

Patch application report for D3456 (id=12250)

Rebasing onto c21d0e3820...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Start using swh-model for ext metadata.
Applying: [WIP] Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.
Applying: retry proxy
Applying: More consistent use of MetadataAuthority/MetadataFetcher.
Applying: pg storage
Changes applied before test
commit f7948783e4a411f7c3723f9f6fe149098fe3e25f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 14:27:48 2020 +0200

    pg storage

commit 4a40b7e31b1567910673c02901a55c092818418a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 14:25:25 2020 +0200

    More consistent use of MetadataAuthority/MetadataFetcher.

commit 9124403ee23436deb7dc2bad79935fad78b1b05a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 14:23:41 2020 +0200

    retry proxy

commit e62cf6e57dbfe34319391cb2d17125500b280f77
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 10:13:13 2020 +0200

    [WIP] Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.

commit e6e099363e266d3b4ee7a14df7e661d8e87932a6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jul 7 13:07:33 2020 +0200

    [WIP] Start using swh-model for ext metadata.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/423/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/423/console

Build has FAILED

Patch application report for D3456 (id=12256)

Rebasing onto c21d0e3820...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Start using swh-model for ext metadata.
Applying: [WIP] Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.
Applying: retry proxy
Applying: More consistent use of MetadataAuthority/MetadataFetcher.
Applying: pg storage
Applying: cass storage
Changes applied before test
commit 4d82558c2f4f53125cda2728225628d848d491e1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 15:25:40 2020 +0200

    cass storage

commit e8931e64bf7267529e52a9ae8bd1ae8200f59cc8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 14:27:48 2020 +0200

    pg storage

commit f931040e91eb82b511b519d3051e01c95d3a12f4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 14:25:25 2020 +0200

    More consistent use of MetadataAuthority/MetadataFetcher.

commit db920708c38b2c358bd609290e16b7ec41f1a50e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 14:23:41 2020 +0200

    retry proxy

commit e5d28fe33ad631007f946f01946a84de716ad593
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 10:13:13 2020 +0200

    [WIP] Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.

commit 7af32152156b74149e5af849e890039421f616d7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jul 7 13:07:33 2020 +0200

    [WIP] Start using swh-model for ext metadata.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/424/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/424/console

Build has FAILED

Patch application report for D3456 (id=12298)

Rebasing onto c3803ef8f7...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Start using swh-model for ext metadata.
Applying: [WIP] Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.
Applying: retry proxy
Applying: More consistent use of MetadataAuthority/MetadataFetcher.
Applying: pg storage
Applying: cass storage
Changes applied before test
commit f387be17b9f97db2c0c5dac36c80e059abac2a48
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 15:25:40 2020 +0200

    cass storage

commit 1a74ce2c79f0410ab2bff543decf7a897360002c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 14:27:48 2020 +0200

    pg storage

commit 10fcc3ae5416278a0b44f5af488c2ab0dac77c34
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 14:25:25 2020 +0200

    More consistent use of MetadataAuthority/MetadataFetcher.

commit 6e111f8f377fcdf8f33378f9a157b3c2d5e5492d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 14:23:41 2020 +0200

    retry proxy

commit 738e9af1c73976c40d1bb0ccf644b9b21371cb4b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 10:13:13 2020 +0200

    [WIP] Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.

commit 585847d786a2d20397ae1d7642b44bd59fff08c1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jul 7 13:07:33 2020 +0200

    [WIP] Start using swh-model for ext metadata.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/433/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/433/console

vlorentz retitled this revision from [WIP] Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict. to Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict..Jul 9 2020, 2:50 PM

Build has FAILED

Patch application report for D3456 (id=12304)

Rebasing onto c3803ef8f7...

Current branch diff-target is up to date.
Changes applied before test
commit ab3dc9e641eff381053393aee374cb80e273b1bf
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 15:25:40 2020 +0200

    cass storage

commit 9faeea15d29aa33cea22df1dbdfbbd5eb9fc757f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 14:27:48 2020 +0200

    pg storage

commit e2d78fec8feadbf8748d1fdce80cb93e27cf4d7f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 14:25:25 2020 +0200

    More consistent use of MetadataAuthority/MetadataFetcher.

commit b32339659c5bd9cad7266f6c504ceb04e668346c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 14:23:41 2020 +0200

    retry proxy

commit 16756d7799f15e01b455a30f9427caeca4086663
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 10:13:13 2020 +0200

    [WIP] Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.

commit c1c925e9c0dfea7f533fac88061319fdb3245c57
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jul 7 13:07:33 2020 +0200

    [WIP] Start using swh-model for ext metadata.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/435/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/435/console

Build is green

Patch application report for D3456 (id=12326)

Rebasing onto de38cd1126...

Current branch diff-target is up to date.
Changes applied before test
commit 76253b003a91cfbdbda7d6f2fbbb63eb962d8d8b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jul 7 13:07:33 2020 +0200

    Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/438/ for more details.

ardumont added inline comments.
swh/storage/cassandra/cql.py
902–905
, f"Expected {set(kwargs)}"

(or something) as it's less ambiguous to read.

swh/storage/cassandra/storage.py
1066

missing test case.

(Those are interesting to test nonetheless for at least can ensuring the message is correctly readable)

1071

same.

1081

Is that possible?

Won't that be caught earlier in clients code when they try to instantiate the model objects?

And if is possible, can you please add tests to it?

;)

swh/storage/in_memory.py
1044

should be covered by the scenario which you should have already added by now so i'll stop mentioning it :)

1087–1091

why not

elif not isinstance...

?

(picture me curious)

1090

Why aren't those checked on cassandra implem? (did not reach pg-storage yet)

hmm, after reading it again, it seems it is but differently, correct?

If it is, why the different approach? (curious me again)

swh/storage/interface.py
1157

Drop the extra "." ;)

... objects:

thx for the review

swh/storage/cassandra/storage.py
1081

I don't think it's possible.

swh/storage/in_memory.py
1087–1091

Natural evolution of the code

1090

Because I wrote them about a week apart :p

swh/storage/interface.py
1123

content might be a bit overloaded here, why not skip it.

Would

if  there is already extrinsic metadata for the same object, ...

be correct enough?

1197

s/of/for ?

1199
which must exist (could be empty but not None though)

What's empty though ;) ?

swh/storage/storage.py
1143–1144

I only realize now, make it returns its dict stats like most other add endpoints.
yeah, i know it changes the interface as well.

that way, you can only add the @process_metrics to that method and drop the internal calls to send_metrics.

Let's agree it can be done in another diff though as this one is quite large already ;)

1291

same remark about metrics

1319

same remark about metrics.

swh/storage/interface.py
1199

{}

swh/storage/cassandra/storage.py
1066

... for at least we can ensure* ...

swh/storage/interface.py
1199

ack, thx.

swh/storage/utils.py
25

Too bad it's not core python.

I called something similar swh.web fmap (wayback ;)

[1] https://forge.softwareheritage.org/source/swh-web/browse/master/swh/web/common/converters.py$29

vlorentz added inline comments.
swh/storage/in_memory.py
1087–1091

Actually, no, my code is correct. I did:

if A:
    if B:
        # error
else:
    if not B:
        # error

the code you are suggesting is:

if A:
    if B:
        # error
elif not B:
    # error

which is completely different.

Made a first pass.

Looks good, mostly questions on missing tests (or if it's actually possible to have such case ;).

I need to make second pass to read the test adaptations though as i only skimmed through.

There is the question to consistency on the add endpoints which now should return their
dict stats (they mostly do except for origin_visit* i recall):

But i'd agree to make that adaptation as a second diff:

  • this diff is quite large already
  • not returning the dict stats is the current implementation
vlorentz added inline comments.
swh/storage/interface.py
1123

yeah I forgot to replace it (copied from content_metadata_add, not origin_metadata_add like Phabricator believes)

1197

I prefer "of"

vlorentz marked an inline comment as done.

rebase + apply comments

swh/storage/in_memory.py
1087–1091

Indeed, i missed it.

Build is green

Patch application report for D3456 (id=12343)

Rebasing onto 23318c21d1...

Current branch diff-target is up to date.
Changes applied before test
commit f5e44204501755765e7a38de635a9f99cd6ea12f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jul 7 13:07:33 2020 +0200

    Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/442/ for more details.

douardda added inline comments.
swh/storage/in_memory.py
1087–1091

I am very slow this morning, so bear with me, but I don't see how these 2 snippets of "if" statement differ.

swh/storage/in_memory.py
1087–1091

@douardda ugh, indeed, they don't.

But I find the lack of symmetry confusing, so I'd rather keep it as is

Thanks.

Looks good.

Don't land it yet though ;);)
Let's wait for your return.

Cheers,

This revision is now accepted and ready to land.Jul 10 2020, 5:03 PM

Build has FAILED

Patch application report for D3456 (id=12537)

Rebasing onto 997ec1dbf7...

Current branch diff-target is up to date.
Changes applied before test
commit f76b5dc6dd4601715efc9a0edf85f89d191ecb70
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jul 7 13:07:33 2020 +0200

    Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/480/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/480/console

Build is green

Patch application report for D3456 (id=12541)

Rebasing onto 997ec1dbf7...

Current branch diff-target is up to date.
Changes applied before test
commit 7131dcb9cfd13ad3b0b58bfd8002e8c830ed47e6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jul 7 13:07:33 2020 +0200

    Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/482/ for more details.

Awesome, and thanks for aligning with the latest sample-data-model change (I planned to do it after you landed this ;)