Page MenuHomeSoftware Heritage

buffer: ensure objects are added in topological order
ClosedPublic

Authored by olasd on Feb 4 2021, 3:17 PM.

Details

Summary

This new integration test checks that, when flushing the buffer storage,
the addition functions of the underlying storage backend are called in
topological order (content, directory, revision, release then snapshot).

This reduces the probability of "data consistency" regressions caused by
the use of the buffering storage proxy alone.

Depends on D5015

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D5017 (id=17889)

Could not rebase; Attempt merge onto 9a9f234e0a...

Updating 9a9f234e..c63ea0d5
Fast-forward
 swh/storage/buffer.py            |   7 +-
 swh/storage/tests/test_buffer.py | 172 +++++++++++++++++++++++++++++++++------
 2 files changed, 152 insertions(+), 27 deletions(-)
Changes applied before test
commit c63ea0d5bcf1138dbbab244ea1a6376591126260
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Feb 4 09:57:50 2021 +0100

    buffer: ensure objects are added in topological order
    
    This new integration test checks that, when flushing the buffer storage,
    the addition functions of the underlying storage backend are called in
    topological order (content, directory, revision, release then snapshot).
    
    This reduces the probability of "data consistency" regressions caused by
    the use of the buffering storage proxy alone.

commit 5b3e6c9f70f94f4ae7440de453fd75cd1d3bbeb6
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Feb 4 09:56:03 2021 +0100

    buffer: add support for snapshots
    
    This is mostly a consistency addition, considering that most (if not
    all) loaders will only add a single snapshot.
    
    The common pattern of loading objects in topological order (content >
    directory > revision > release > snapshot), then flushing the storage,
    is now fully consistent; Without this addition, the snapshot addition
    would reach the backend storage before all other objects are added,
    leading to potential inconsistencies if the flush of other object types
    fails.

commit 18967ed4a5f2eb67c57ae34ed913073b45bedc90
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Feb 4 10:14:59 2021 +0100

    buffer: add type annotations for tests

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1124/ for more details.

olasd requested review of this revision.Feb 4 2021, 3:22 PM
ardumont added a subscriber: ardumont.

lgtm

one remark inline

swh/storage/tests/test_buffer.py
521

Surprised a bit for the first iteration on content_add, i would expect an AssertionError on 0 > 0...
I guess the methods_called.index("content_add") returns something strictly positive then (no idea really).

Wouldn't it be "safer" to start prev with -1 or something?

This revision is now accepted and ready to land.Feb 4 2021, 4:54 PM
swh/storage/tests/test_buffer.py
521

So, you're totally right, but, currently, buffer.flush() calls through to buffer.storage.flush() first, /then/ pushes the objects.

I suspect that's an oversight from the original implementation, and I'd argue it's a bug (even if, in practice, the buffering storage is probably the only flushable storage). Let's fix that.

Flip with D5018; Flush in the right order

ardumont added inline comments.
swh/storage/tests/test_buffer.py
521

oh right!

oversight conceded ;)

Build is green

Patch application report for D5017 (id=17902)

Could not rebase; Attempt merge onto 9a9f234e0a...

Updating 9a9f234e..efd8815b
Fast-forward
 swh/storage/buffer.py            |  20 +++-
 swh/storage/tests/test_buffer.py | 205 ++++++++++++++++++++++++++++++++++-----
 2 files changed, 196 insertions(+), 29 deletions(-)
Changes applied before test
commit efd8815b890895ff7208c122a21b01cf5cbfb7f3
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Feb 4 09:57:50 2021 +0100

    buffer: ensure objects are flushed in topological order
    
    This new integration test checks that, when flushing the buffer storage,
    the addition functions of the underlying storage backend are called in
    topological order (content, directory, revision, release then snapshot).
    
    This reduces the probability of "data consistency" regressions caused by
    the use of the buffering storage proxy alone.

commit 1526107b12eb2b3f607edcb3558ee12648d3f154
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Feb 4 14:24:50 2021 +0100

    Return an accurate summary from buffer's flush() method
    
    The earlier implementation would only return summary data from keys that
    existed in the last `_add` backend method run, rather than collating all
    the results.

commit 5b3e6c9f70f94f4ae7440de453fd75cd1d3bbeb6
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Feb 4 09:56:03 2021 +0100

    buffer: add support for snapshots
    
    This is mostly a consistency addition, considering that most (if not
    all) loaders will only add a single snapshot.
    
    The common pattern of loading objects in topological order (content >
    directory > revision > release > snapshot), then flushing the storage,
    is now fully consistent; Without this addition, the snapshot addition
    would reach the backend storage before all other objects are added,
    leading to potential inconsistencies if the flush of other object types
    fails.

commit 18967ed4a5f2eb67c57ae34ed913073b45bedc90
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Feb 4 10:14:59 2021 +0100

    buffer: add type annotations for tests

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1130/ for more details.