Page MenuHomeSoftware Heritage

swh.storage.filter: Add filtering storage implementation
ClosedPublic

Authored by ardumont on Oct 8 2019, 2:40 PM.

Details

Summary

Also add a sample_data fixture to read default test data from.

Depends on D2083

Test Plan

tox

Diff Detail

Repository
rDSTO Storage manager
Branch
master
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 8176
Build 11791: tox-on-jenkinsJenkins
Build 11790: arc lint + arc unit

Event Timeline

I don't think this is useful anymore. The postgresql storage does the filtering itself now (see D2034). Are you planning on removing this feature from the postgresql storage?

swh/storage/filter.py
13

'This'

I don't think this is useful anymore. The postgresql storage does the filtering itself now (see D2034). Are you planning on removing this feature from the postgresql storage?

It always did that.
But it does it server-side so we send lots of data on the wire (which has a cost).

It's for doing it client side as the current loaders do but it's entangled within its own loading logic (check swh.loader.core.loader.BufferedLoader).

This allows to untangle that part configuration wise.

Maybe the issue is with where i've put the modules.
Maybe elsewhere swh.storage.client (or something) would be clearer...

swh/storage/filter.py
13

;)

BUILD has failed

swh/storage/filter.py:50: error: Argument 2 to "_filter_missing_ids" of "FilteringProxyStorage" has incompatible type "Generator[Any, None, None]"; expected "Sequence[bytes]"

hi ho!

I don't think this is useful anymore. The postgresql storage does the filtering itself now (see D2034). Are you planning on removing this feature from the postgresql storage?

It always did that.

Right, indeed

But it does it server-side so we send lots of data on the wire (which has a cost).

It's for doing it client side as the current loaders do but it's entangled within its own loading logic (check swh.loader.core.loader.BufferedLoader).

This allows to untangle that part configuration wise.

Right!

Accepting the diff, just make sure to fix the typo and the build

This revision is now accepted and ready to land.Oct 8 2019, 3:05 PM
vlorentz requested changes to this revision.Oct 8 2019, 3:07 PM

Actually, I still have some nitpicking to do: instead of having sample_data as a large fixture returning a dict of various value types; could you split it into smaller fixtures, each returning a single type of values?

This revision now requires changes to proceed.Oct 8 2019, 3:07 PM

Accepting, because @ardumont promised to fix it on IRC ;)

This revision is now accepted and ready to land.Oct 8 2019, 3:13 PM

Accepting, because @ardumont promised to fix it on IRC ;)

15:11 <+ardumont> i agree but there is an in-progress from douardda about the storage implem' tests
15:11 <+ardumont> and i think he will come up with something better
15:12 <vlorentz> are you *sure* this large fixture will be removed in the next ~2 weeks?
15:12 <+ardumont> most probably yes because *I*'ll make sure of it
15:12 <vlorentz> okay

Adapt according to review and mypy check