Page MenuHomeSoftware Heritage

Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.
ClosedPublic

Authored by vlorentz on Jul 8 2020, 10:13 AM.

Diff Detail

Repository
rDSTO Storage manager
Branch
model-metadata
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 13469
Build 20603: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 20602: arc lint + arc unit

Event Timeline

Build has FAILED

Patch application report for D3456 (id=12233)

Rebasing onto e45ca76db4...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Start using swh-model for ext metadata.
Applying: [WIP] Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.
Changes applied before test
commit 1c2b98589bfdf231ea20c27800b38a6db7495da9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 10:13:13 2020 +0200

    [WIP] Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.

commit bc4a8ef0cc58787cd092cae65288ecfbfb39e36e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jul 7 13:07:33 2020 +0200

    [WIP] Start using swh-model for ext metadata.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/421/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/421/console

  • retry proxy
  • More consistent use of MetadataAuthority/MetadataFetcher.
  • pg storage

Build has FAILED

Patch application report for D3456 (id=12250)

Rebasing onto c21d0e3820...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Start using swh-model for ext metadata.
Applying: [WIP] Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.
Applying: retry proxy
Applying: More consistent use of MetadataAuthority/MetadataFetcher.
Applying: pg storage
Changes applied before test
commit f7948783e4a411f7c3723f9f6fe149098fe3e25f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 14:27:48 2020 +0200

    pg storage

commit 4a40b7e31b1567910673c02901a55c092818418a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 14:25:25 2020 +0200

    More consistent use of MetadataAuthority/MetadataFetcher.

commit 9124403ee23436deb7dc2bad79935fad78b1b05a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 14:23:41 2020 +0200

    retry proxy

commit e62cf6e57dbfe34319391cb2d17125500b280f77
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 10:13:13 2020 +0200

    [WIP] Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.

commit e6e099363e266d3b4ee7a14df7e661d8e87932a6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jul 7 13:07:33 2020 +0200

    [WIP] Start using swh-model for ext metadata.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/423/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/423/console

Build has FAILED

Patch application report for D3456 (id=12256)

Rebasing onto c21d0e3820...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Start using swh-model for ext metadata.
Applying: [WIP] Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.
Applying: retry proxy
Applying: More consistent use of MetadataAuthority/MetadataFetcher.
Applying: pg storage
Applying: cass storage
Changes applied before test
commit 4d82558c2f4f53125cda2728225628d848d491e1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 15:25:40 2020 +0200

    cass storage

commit e8931e64bf7267529e52a9ae8bd1ae8200f59cc8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 14:27:48 2020 +0200

    pg storage

commit f931040e91eb82b511b519d3051e01c95d3a12f4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 14:25:25 2020 +0200

    More consistent use of MetadataAuthority/MetadataFetcher.

commit db920708c38b2c358bd609290e16b7ec41f1a50e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 14:23:41 2020 +0200

    retry proxy

commit e5d28fe33ad631007f946f01946a84de716ad593
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 10:13:13 2020 +0200

    [WIP] Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.

commit 7af32152156b74149e5af849e890039421f616d7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jul 7 13:07:33 2020 +0200

    [WIP] Start using swh-model for ext metadata.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/424/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/424/console

Build has FAILED

Patch application report for D3456 (id=12298)

Rebasing onto c3803ef8f7...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Start using swh-model for ext metadata.
Applying: [WIP] Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.
Applying: retry proxy
Applying: More consistent use of MetadataAuthority/MetadataFetcher.
Applying: pg storage
Applying: cass storage
Changes applied before test
commit f387be17b9f97db2c0c5dac36c80e059abac2a48
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 15:25:40 2020 +0200

    cass storage

commit 1a74ce2c79f0410ab2bff543decf7a897360002c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 14:27:48 2020 +0200

    pg storage

commit 10fcc3ae5416278a0b44f5af488c2ab0dac77c34
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 14:25:25 2020 +0200

    More consistent use of MetadataAuthority/MetadataFetcher.

commit 6e111f8f377fcdf8f33378f9a157b3c2d5e5492d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 14:23:41 2020 +0200

    retry proxy

commit 738e9af1c73976c40d1bb0ccf644b9b21371cb4b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 10:13:13 2020 +0200

    [WIP] Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.

commit 585847d786a2d20397ae1d7642b44bd59fff08c1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jul 7 13:07:33 2020 +0200

    [WIP] Start using swh-model for ext metadata.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/433/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/433/console

vlorentz retitled this revision from [WIP] Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict. to Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict..Jul 9 2020, 2:50 PM

Build has FAILED

Patch application report for D3456 (id=12304)

Rebasing onto c3803ef8f7...

Current branch diff-target is up to date.
Changes applied before test
commit ab3dc9e641eff381053393aee374cb80e273b1bf
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 15:25:40 2020 +0200

    cass storage

commit 9faeea15d29aa33cea22df1dbdfbbd5eb9fc757f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 14:27:48 2020 +0200

    pg storage

commit e2d78fec8feadbf8748d1fdce80cb93e27cf4d7f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 14:25:25 2020 +0200

    More consistent use of MetadataAuthority/MetadataFetcher.

commit b32339659c5bd9cad7266f6c504ceb04e668346c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 14:23:41 2020 +0200

    retry proxy

commit 16756d7799f15e01b455a30f9427caeca4086663
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 8 10:13:13 2020 +0200

    [WIP] Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.

commit c1c925e9c0dfea7f533fac88061319fdb3245c57
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jul 7 13:07:33 2020 +0200

    [WIP] Start using swh-model for ext metadata.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/435/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/435/console

Build is green

Patch application report for D3456 (id=12326)

Rebasing onto de38cd1126...

Current branch diff-target is up to date.
Changes applied before test
commit 76253b003a91cfbdbda7d6f2fbbb63eb962d8d8b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jul 7 13:07:33 2020 +0200

    Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/438/ for more details.

ardumont added inline comments.
swh/storage/cassandra/cql.py
902 ↗(On Diff #12326)
, f"Expected {set(kwargs)}"

(or something) as it's less ambiguous to read.

swh/storage/cassandra/storage.py
1023 ↗(On Diff #12326)

missing test case.

(Those are interesting to test nonetheless for at least can ensuring the message is correctly readable)

1028 ↗(On Diff #12326)

same.

1052 ↗(On Diff #12326)

Is that possible?

Won't that be caught earlier in clients code when they try to instantiate the model objects?

And if is possible, can you please add tests to it?

;)

swh/storage/in_memory.py
1042

should be covered by the scenario which you should have already added by now so i'll stop mentioning it :)

1085–1089

why not

elif not isinstance...

?

(picture me curious)

1088

Why aren't those checked on cassandra implem? (did not reach pg-storage yet)

hmm, after reading it again, it seems it is but differently, correct?

If it is, why the different approach? (curious me again)

swh/storage/interface.py
1157

Drop the extra "." ;)

... objects:

thx for the review

swh/storage/cassandra/storage.py
1052 ↗(On Diff #12326)

I don't think it's possible.

swh/storage/in_memory.py
1085–1089

Natural evolution of the code

1088

Because I wrote them about a week apart :p

swh/storage/interface.py
1123

content might be a bit overloaded here, why not skip it.

Would

if  there is already extrinsic metadata for the same object, ...

be correct enough?

1197

s/of/for ?

1199
which must exist (could be empty but not None though)

What's empty though ;) ?

swh/storage/storage.py
1156–1157

I only realize now, make it returns its dict stats like most other add endpoints.
yeah, i know it changes the interface as well.

that way, you can only add the @process_metrics to that method and drop the internal calls to send_metrics.

Let's agree it can be done in another diff though as this one is quite large already ;)

1292

same remark about metrics

1320

same remark about metrics.

swh/storage/interface.py
1199

{}

swh/storage/cassandra/storage.py
1023 ↗(On Diff #12326)

... for at least we can ensure* ...

swh/storage/interface.py
1199

ack, thx.

swh/storage/utils.py
26 ↗(On Diff #12326)

Too bad it's not core python.

I called something similar swh.web fmap (wayback ;)

[1] https://forge.softwareheritage.org/source/swh-web/browse/master/swh/web/common/converters.py$29

vlorentz added inline comments.
swh/storage/in_memory.py
1085–1089

Actually, no, my code is correct. I did:

if A:
    if B:
        # error
else:
    if not B:
        # error

the code you are suggesting is:

if A:
    if B:
        # error
elif not B:
    # error

which is completely different.

Made a first pass.

Looks good, mostly questions on missing tests (or if it's actually possible to have such case ;).

I need to make second pass to read the test adaptations though as i only skimmed through.

There is the question to consistency on the add endpoints which now should return their
dict stats (they mostly do except for origin_visit* i recall):

But i'd agree to make that adaptation as a second diff:

  • this diff is quite large already
  • not returning the dict stats is the current implementation
vlorentz added inline comments.
swh/storage/interface.py
1123

yeah I forgot to replace it (copied from content_metadata_add, not origin_metadata_add like Phabricator believes)

1197

I prefer "of"

vlorentz marked an inline comment as done.

rebase + apply comments

swh/storage/in_memory.py
1085–1089

Indeed, i missed it.

Build is green

Patch application report for D3456 (id=12343)

Rebasing onto 23318c21d1...

Current branch diff-target is up to date.
Changes applied before test
commit f5e44204501755765e7a38de635a9f99cd6ea12f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jul 7 13:07:33 2020 +0200

    Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/442/ for more details.

douardda added inline comments.
swh/storage/in_memory.py
1085–1089

I am very slow this morning, so bear with me, but I don't see how these 2 snippets of "if" statement differ.

swh/storage/in_memory.py
1085–1089

@douardda ugh, indeed, they don't.

But I find the lack of symmetry confusing, so I'd rather keep it as is

Thanks.

Looks good.

Don't land it yet though ;);)
Let's wait for your return.

Cheers,

This revision is now accepted and ready to land.Jul 10 2020, 5:03 PM

Build has FAILED

Patch application report for D3456 (id=12537)

Rebasing onto 997ec1dbf7...

Current branch diff-target is up to date.
Changes applied before test
commit f76b5dc6dd4601715efc9a0edf85f89d191ecb70
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jul 7 13:07:33 2020 +0200

    Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/480/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/480/console

Build is green

Patch application report for D3456 (id=12541)

Rebasing onto 997ec1dbf7...

Current branch diff-target is up to date.
Changes applied before test
commit 7131dcb9cfd13ad3b0b58bfd8002e8c830ed47e6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jul 7 13:07:33 2020 +0200

    Make metadata-related endpoints consistent with other endpoints by using Iterables of swh-model objects instead of a dict.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/482/ for more details.

Awesome, and thanks for aligning with the latest sample-data-model change (I planned to do it after you landed this ;)