Page MenuHomeSoftware Heritage

storage.buffer: Add a new clear_buffers operation for the buffer proxy
ClosedPublic

Authored by ardumont on Apr 7 2020, 2:57 PM.

Diff Detail

Repository
rDSTO Storage manager
Branch
add-clear-buffers-operation (branched from master)
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 11736
Build 17790: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 17789: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D2966 (id=10553)

Rebasing onto 82b41bac01...

Current branch diff-target is up to date.
Changes applied before test
commit 18b38aec91d3bd281a4e1a9d3eb44e2845644d29
Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
Date:   Tue Apr 7 14:56:46 2020 +0200

    storage.buffer: Add a new clear operation for the buffer proxy
    
    Related to T2352

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/53/ for more details.

olasd requested changes to this revision.EditedApr 7 2020, 3:11 PM
olasd added a subscriber: olasd.

This is missing tests for directories and snapshots :)

To make things clearer, I think a few things can be changed:

  • the method should be renamed clear_buffers (we don't want people to think calling that method will wipe their storage!);
  • the docstring should have a preeminent warning that data that has not been flushed to storage is lost when this method is called; this should only be called when flush() fails and you want to continue your processing;
  • the method should be implemented as a pass-through for other filters (i.e. have a recursive call to self.storage.clear_buffers()) and be a noop for all backends (so we don't need to poke around checking whether the current storage really implements it or not).

But overall, I think this will be very useful!

swh/storage/tests/test_buffer.py
324–328

you can probably add a lower bound 0 < len(XXX) too (i.e. check that we're not testing on empty lists...)

This revision now requires changes to proceed.Apr 7 2020, 3:11 PM

This is missing tests for directories and snapshots :)

directories and skipped_contents!

I thought it was enough to be partial about it but ok, adding them then ;)

the method should be renamed clear_buffers (we don't want people to think
calling that method will wipe their storage!);

done

the docstring should have a preeminent warning that data that has not been
flushed to storage is lost when this method is called; this should only be
called when flush() fails and you want to continue your processing;

indeed

the method should be implemented as a pass-through for other filters (i.e.
have a recursive call to self.storage.clear_buffers()) and be a noop for all
backends (so we don't need to poke around checking whether the current
storage really implements it or not).

I guess you meant the actual "real" backend (pg-storage, cassandra, in-memory).
Ok, then updating them also.

  • Rename method clear to clear_buffers
  • Add clear_buffers on all storage endpoints, passthrough for proxy storage, noop endpoint for main backend implementations
  • Explicit the warning about losing data when calling the clear_buffers endpoint
  • complete tests with directories and skipped_contents
swh/storage/retry.py
183

Wondering if that'd make sense to define it as:

def clear_buffers(
        self, object_types: Optional[Iterable[str]] = None) -> None:
    """Clear objects from current filter

    """
    if object_types is None:
        object_types = self.object_types

    for object_type in object_types:
        self.object_seen[object_type] = set()

    if hasattr(self.storage, 'clear_buffers'):
        return self.storage.clear_buffers(object_types)
    return None

That's not quite the same meaning, isn't it?

Build is green

Patch application report for D2966 (id=10554)

Rebasing onto 82b41bac01...

Current branch diff-target is up to date.
Changes applied before test
commit 3f3925c66bd80ee25a459610b2974a332d43195d
Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
Date:   Tue Apr 7 14:56:46 2020 +0200

    storage*: Add `clear_buffers` operation for proxy storages
    
    This also adds the endpoint as noop for the main backend implementations.
    
    Related to T2352

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/54/ for more details.

If you add clear_buffers to the basic signature of Storage, I think you can skip all the hasattrs.

swh/storage/buffer.py
150

You're missing the recursive call here.

swh/storage/retry.py
183

Put your comment in the wrong place? And yes, I think the filter.py's clear_buffers should do that, or the contents will grow out of bounds.

swh/storage/buffer.py
150

ah yeah, thx.
i saw it after proposing the filter one (where i added it ;)

i was thinking about adding it but got lost in thought ;)

swh/storage/retry.py
183

mmmph, yeah wrong proxy ;)

If you add clear_buffers to the basic signature of Storage, I think you can skip all the hasattrs.

oh right!

  • Remove the check on hasattr as all storage implementations have the clear_buffers
  • Add the clear_buffers implementations on filter proxy storage (+ tests)

Update test-filter docstring

Build is green

Patch application report for D2966 (id=10557)

Rebasing onto 82b41bac01...

Current branch diff-target is up to date.
Changes applied before test
commit ce392bd5db5da1af6f4c718cc7a5cda1824d82bb
Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
Date:   Tue Apr 7 14:56:46 2020 +0200

    storage*: Add `clear_buffers` operation for proxy storages
    
    This also adds the endpoint as noop for the main backend implementations.
    
    Related to T2352

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/55/ for more details.

Build is green

Patch application report for D2966 (id=10558)

Rebasing onto 82b41bac01...

Current branch diff-target is up to date.
Changes applied before test
commit cd33dd6d46a28d99a8c05358ebab8ae4c6299f63
Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
Date:   Tue Apr 7 14:56:46 2020 +0200

    storage*: Add `clear_buffers` operation for proxy storages
    
    This also adds the endpoint as noop for the main backend implementations.
    
    Related to T2352

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/56/ for more details.

olasd requested changes to this revision.EditedApr 8 2020, 11:15 AM

The get_storage_with_filter_config refactoring could have been a new storage fixture (local to this module) instead (and it should be in a separate commit).

You really need to add the clear_buffers method to interface.py as well. The method needs to be available on the rpc client/server, because we're recursively calling it on all backends.

You should add a basic "call clear_buffers" test to the main storage test suite too, that way we make sure that the method works on all backends.

This revision now requires changes to proceed.Apr 8 2020, 11:15 AM

The get_storage_with_filter_config refactoring could have been a new storage fixture (local to this module) instead

i aligned with the filter one but ok.

(and it should be in a separate commit).

ok

You really need to add the clear_buffers method to interface.py as well. The method needs to be available on the rpc client/server, because we're recursively calling it on all backends.

Thanks, i was unsure about that.

You should add a basic "call clear_buffers" test to the main storage test suite too, that way we make sure that the method works on all backends.

right.

  • storage*: Add clear_buffers operation for proxy storages (including interface, rpc/server)
  • test_filter: Extract the filter storage into a fixture

Build is green

Patch application report for D2966 (id=10574)

Rebasing onto 8e8577e8d4...

Current branch diff-target is up to date.
Changes applied before test
commit efe4c02cbb2d495faa5c5f64cde291fd32ddfef8
Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
Date:   Wed Apr 8 11:27:30 2020 +0200

    test_filter: Extract the filter storage into a fixture

commit f755f5a2e39dfd7215d6704ba570d2c00e6709d9
Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
Date:   Tue Apr 7 14:56:46 2020 +0200

    storage*: Add `clear_buffers` operation for proxy storages
    
    This also adds the endpoint as noop for the main backend implementations.
    
    Related to T2352

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/62/ for more details.

ardumont retitled this revision from storage.buffer: Add a new clear operation for the buffer proxy to storage.buffer: Add a new clear_buffers operation for the buffer proxy.Apr 8 2020, 2:00 PM

Heads up.

This needs a rebase.
Which i keep pending for now as the current master is red (due to flake8 issues).
The storage will get blackified which should take care of those.

Rebase on latest master and blackify

Build is green

Patch application report for D2966 (id=10594)

Rebasing onto cd52a03bee...

Current branch diff-target is up to date.
Changes applied before test
commit 8d7909582115255d2d44bcc4805154d2de21ef82
Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
Date:   Wed Apr 8 11:27:30 2020 +0200

    test_filter: Extract the filter storage into a fixture

commit e0349fcf44abccfb3690a80a83a5ba6f70b2f3e8
Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
Date:   Tue Apr 7 14:56:46 2020 +0200

    storage*: Add `clear_buffers` operation for proxy storages
    
    This also adds the endpoint as noop for the main backend implementations.
    
    Related to T2352

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/69/ for more details.

interface: Improve docstring sentence

Build is green

Patch application report for D2966 (id=10614)

Rebasing onto ed4097c761...

Current branch diff-target is up to date.
Changes applied before test
commit cf152edb21ca8f6ba952141aa3e3afeb7ed437b6
Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
Date:   Wed Apr 8 11:27:30 2020 +0200

    test_filter: Extract the filter storage into a fixture

commit 6be5c035175b5d9a4b4d2a0db86c185285d0088b
Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
Date:   Tue Apr 7 14:56:46 2020 +0200

    storage*: Add `clear_buffers` operation for proxy storages
    
    This also adds the endpoint as noop for the main backend implementations.
    
    Related to T2352

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/72/ for more details.

Build has FAILED

Patch application report for D2966 (id=10616)

Rebasing onto ed4097c761...

Current branch diff-target is up to date.
Changes applied before test
commit b0b0313c9645052822800197e15009690e43837a
Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
Date:   Wed Apr 8 11:27:30 2020 +0200

    test_filter: Extract the filter storage into a fixture

commit 566c325cce881a3f73c63a39ac1128db638d4bdb
Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
Date:   Tue Apr 7 14:56:46 2020 +0200

    storage*: Add `clear_buffers` operation for proxy storages
    
    This also adds the endpoint as noop for the main backend implementations.
    
    Related to T2352

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/74/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/74/console

Build is green

Patch application report for D2966 (id=10616)

Rebasing onto ed4097c761...

Current branch diff-target is up to date.
Changes applied before test
commit b0b0313c9645052822800197e15009690e43837a
Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
Date:   Wed Apr 8 11:27:30 2020 +0200

    test_filter: Extract the filter storage into a fixture

commit 566c325cce881a3f73c63a39ac1128db638d4bdb
Author: Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>
Date:   Tue Apr 7 14:56:46 2020 +0200

    storage*: Add `clear_buffers` operation for proxy storages
    
    This also adds the endpoint as noop for the main backend implementations.
    
    Related to T2352

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/75/ for more details.

This revision is now accepted and ready to land.Apr 9 2020, 11:03 AM

olasd: successfully used magit to push as you explain to me \m/

Fixing my magit setup a while back, i finally understood why i could not use the interactive rebase.
I completely forgot about the "exec" command prefix in the line (i just put the command right there...).

Magit's explain docstring within the git-rebase-todo mode showed me my erroneous ways...