Page MenuHomeSoftware Heritage

swh.storage.filter: Add filtering storage implementation
ClosedPublic

Authored by ardumont on Tue, Oct 8, 2:40 PM.

Details

Summary

Also add a sample_data fixture to read default test data from.

Depends on D2083

Test Plan

tox

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ardumont created this revision.Tue, Oct 8, 2:40 PM

I don't think this is useful anymore. The postgresql storage does the filtering itself now (see D2034). Are you planning on removing this feature from the postgresql storage?

swh/storage/filter.py
14

'This'

ardumont added a comment.EditedTue, Oct 8, 2:57 PM

I don't think this is useful anymore. The postgresql storage does the filtering itself now (see D2034). Are you planning on removing this feature from the postgresql storage?

It always did that.
But it does it server-side so we send lots of data on the wire (which has a cost).

It's for doing it client side as the current loaders do but it's entangled within its own loading logic (check swh.loader.core.loader.BufferedLoader).

This allows to untangle that part configuration wise.

Maybe the issue is with where i've put the modules.
Maybe elsewhere swh.storage.client (or something) would be clearer...

swh/storage/filter.py
14

;)

BUILD has failed

swh/storage/filter.py:50: error: Argument 2 to "_filter_missing_ids" of "FilteringProxyStorage" has incompatible type "Generator[Any, None, None]"; expected "Sequence[bytes]"

hi ho!

vlorentz accepted this revision.Tue, Oct 8, 3:05 PM

I don't think this is useful anymore. The postgresql storage does the filtering itself now (see D2034). Are you planning on removing this feature from the postgresql storage?

It always did that.

Right, indeed

But it does it server-side so we send lots of data on the wire (which has a cost).
It's for doing it client side as the current loaders do but it's entangled within its own loading logic (check swh.loader.core.loader.BufferedLoader).
This allows to untangle that part configuration wise.

Right!

Accepting the diff, just make sure to fix the typo and the build

This revision is now accepted and ready to land.Tue, Oct 8, 3:05 PM
vlorentz requested changes to this revision.Tue, Oct 8, 3:07 PM

Actually, I still have some nitpicking to do: instead of having sample_data as a large fixture returning a dict of various value types; could you split it into smaller fixtures, each returning a single type of values?

This revision now requires changes to proceed.Tue, Oct 8, 3:07 PM
vlorentz accepted this revision.Tue, Oct 8, 3:13 PM

Accepting, because @ardumont promised to fix it on IRC ;)

This revision is now accepted and ready to land.Tue, Oct 8, 3:13 PM
ardumont added a comment.EditedTue, Oct 8, 3:13 PM

Accepting, because @ardumont promised to fix it on IRC ;)

15:11 <+ardumont> i agree but there is an in-progress from douardda about the storage implem' tests
15:11 <+ardumont> and i think he will come up with something better
15:12 <vlorentz> are you *sure* this large fixture will be removed in the next ~2 weeks?
15:12 <+ardumont> most probably yes because *I*'ll make sure of it
15:12 <vlorentz> okay
ardumont updated this revision to Diff 7006.Tue, Oct 8, 3:35 PM

Adapt according to review and mypy check

ardumont updated this revision to Diff 7011.Tue, Oct 8, 4:01 PM

Fix commit message

ardumont updated this revision to Diff 7014.Tue, Oct 8, 4:16 PM

Rebase to latest diff