Page MenuHomeSoftware Heritage

Add a new TenaciousProxyStorage
Needs ReviewPublic

Authored by douardda on Mon, Jun 22, 5:10 PM.

Details

Reviewers
None
Group Reviewers
Reviewers
Summary

This proxy storage attempt to add buckets of objects, but in case of failure,
it splits the bucket in parts so every valid object in the bucket get a chance
to be inserted.

Also provides an error rate-limiting feature.

:Warning: the behavior of this proxy when the rate limit threshold has been
reach is not ok and needs to be discussed and adapted before landing.

Diff Detail

Event Timeline

douardda created this revision.Mon, Jun 22, 5:10 PM

Build has FAILED

Patch application report for D3334 (id=11815)

Could not rebase; Attempt merge onto 2d497ff07c...

Auto-merging swh/storage/tests/algos/test_origin.py
Merge made by the 'recursive' strategy.
 mypy.ini                               |   3 +
 requirements.txt                       |   1 +
 swh/storage/__init__.py                |   3 +
 swh/storage/cassandra/storage.py       |  17 +-
 swh/storage/in_memory.py               |  15 +-
 swh/storage/interface.py               |   7 +-
 swh/storage/storage.py                 |  38 +++--
 swh/storage/tenacious.py               | 124 ++++++++++++++
 swh/storage/tests/algos/test_origin.py |   7 +-
 swh/storage/tests/test_storage.py      |  83 ++++++----
 swh/storage/tests/test_tenacious.py    | 293 +++++++++++++++++++++++++++++++++
 swh/storage/validate.py                |   7 +-
 12 files changed, 531 insertions(+), 67 deletions(-)
 create mode 100644 swh/storage/tenacious.py
 create mode 100644 swh/storage/tests/test_tenacious.py
Changes applied before test
commit 438ccc7c3b4a37efe5afc39d1f695382d0af6b95
Merge: 2d497ff0 3f7075e8
Author: Jenkins user <jenkins@localhost>
Date:   Mon Jun 22 15:19:41 2020 +0000

    Merge branch 'diff-target' into HEAD

commit 3f7075e8a07e1d54b951b76b3575141f5a1c3bb7
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jun 18 18:41:24 2020 +0200

    Add a new TenaciousProxyStorage
    
    This proxy storage attempt to add buckets of objects, but in case of failure,
    it splits the bucket in parts so every valid object in the bucket get a chance
    to be inserted.
    
    Also provides an error rate-limiting feature.

commit 4f156595f7894a6e346fed1e5f5f3262ea3ffbab
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon Jun 22 11:27:54 2020 +0200

    Deprecate the origin_add_one() endpoint
    
    This endpoint is not really useful since the origin_add() can be used
    instead.
    
    Using a single API endpoint would also make the API a bit more consistant
    (most other endpoints only provide a xxx_add endpoint) ; having a single
    endpoint per object_type make is enough and make the whole API simpler.

commit f6aef1c74cb30d9e890ee538baf1010157542967
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jun 18 18:28:51 2020 +0200

    Make Storage.add_origin() return a sumary dict
    
    make it consistent with other add_xxx methods by making it return a
    summary dict `{"origin:add": int}`.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/333/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/333/console

vlorentz added a subscriber: vlorentz.EditedTue, Jun 23, 12:16 PM

performance nitpick: you're using to_add as a stack, but with the top at the beginning, so every operation copies the whole list. You should flip that.

Stopping after 10 failures over 1000 insertions seems a bit harsh. IMO it should only stop if there is something obviously wrong with the storage, so more something of the order of 10 failures over 50 or 100 insertions

swh/storage/tenacious.py
45

If

58–61

no more args:

63–64

could you document these two numbers? And I think total would be better named window_size or something of the sort.

82

_tenacious_add

94–97

It should raise an exception, else the client will think it went fine and will proceed to adding objects referencing it, so it will create holes in the graph.

105–107

I think it should be an error (so it appears in sentry), and contain the exception's name and args.

109

nice.

122–125

you're reimplementing collections.Counter. Instead, you can make results a Counter and call results.update()

Build has FAILED

Patch application report for D3334 (id=11842)

Could not rebase; Attempt merge onto 2d497ff07c...

Updating 2d497ff0..73f9b659
Fast-forward
 mypy.ini                               |   3 +
 requirements.txt                       |   1 +
 swh/storage/__init__.py                |   3 +
 swh/storage/cassandra/storage.py       |  19 ++-
 swh/storage/in_memory.py               |  15 +-
 swh/storage/interface.py               |   7 +-
 swh/storage/storage.py                 |  38 +++--
 swh/storage/tenacious.py               | 124 ++++++++++++++
 swh/storage/tests/algos/test_origin.py |   7 +-
 swh/storage/tests/test_storage.py      |  83 ++++++----
 swh/storage/tests/test_tenacious.py    | 293 +++++++++++++++++++++++++++++++++
 swh/storage/validate.py                |   7 +-
 12 files changed, 532 insertions(+), 68 deletions(-)
 create mode 100644 swh/storage/tenacious.py
 create mode 100644 swh/storage/tests/test_tenacious.py
Changes applied before test
commit 73f9b6595362f848d002c2f6c4db45c7b5cd15e6
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jun 18 18:41:24 2020 +0200

    Add a new TenaciousProxyStorage
    
    This proxy storage attempt to add buckets of objects, but in case of failure,
    it splits the bucket in parts so every valid object in the bucket get a chance
    to be inserted.
    
    Also provides an error rate-limiting feature.

commit 621fc8d3779535656e15499464c44acecd7eb8bf
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon Jun 22 11:27:54 2020 +0200

    Deprecate the origin_add_one() endpoint
    
    This endpoint is not really useful since the origin_add() can be used
    instead.
    
    Using a single API endpoint would also make the API a bit more consistant
    (most other endpoints only provide a xxx_add endpoint) ; having a single
    endpoint per object_type make is enough and make the whole API simpler.

commit fb603e1b9874a3bbb393f7653ec5004bfa18b251
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jun 18 18:28:51 2020 +0200

    Make Storage.add_origin() return a sumary dict
    
    make it consistent with other add_xxx methods by making it return a
    summary dict `{"origin:add": int}`.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/339/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/339/console

ardumont edited the summary of this revision. (Show Details)Mon, Jun 29, 10:39 AM