Page MenuHomeSoftware Heritage

Add counting storage proxy
ClosedPublic

Authored by vlorentz on Aug 27 2021, 11:32 AM.

Details

Summary

It will be used in the Cassandra experiment.

Currently we use the built-in counters of the Cassandra backend; but in
addition to being inaccurate, they seem to be a bottleneck.

This proxy will be a lightweight solution for counting object insertion,
without needing to run Kafka on the test cluster.

Diff Detail

Repository
rDSTO Storage manager
Branch
counting-proxy
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 23212
Build 36218: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 36217: arc lint + arc unit

Event Timeline

Build has FAILED

Patch application report for D6149 (id=22252)

Could not rebase; Attempt merge onto b110d1b69c...

Merge made by the 'recursive' strategy.
 swh/storage/__init__.py             |  5 ++-
 swh/storage/cassandra/cql.py        | 88 ++++++++++++++++++++++++++++++++++---
 swh/storage/cassandra/storage.py    | 24 ++++++++--
 swh/storage/in_memory.py            |  1 +
 swh/storage/proxies/counter.py      | 66 ++++++++++++++++++++++++++++
 swh/storage/tests/test_cassandra.py |  7 +--
 swh/storage/tests/test_counter.py   | 63 ++++++++++++++++++++++++++
 7 files changed, 238 insertions(+), 16 deletions(-)
 create mode 100644 swh/storage/proxies/counter.py
 create mode 100644 swh/storage/tests/test_counter.py
Changes applied before test
commit d14d3815aed40d765d6939d90396299c96a9a727
Merge: b110d1b6 1875046f
Author: Jenkins user <jenkins@localhost>
Date:   Fri Aug 27 09:32:42 2021 +0000

    Merge branch 'diff-target' into HEAD

commit 1875046f31eaa61e3f999e351f86dfba66b58680
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 27 11:32:03 2021 +0200

    Add counting storage proxy
    
    It will be used in the Cassandra experiment.
    
    Currently we use the built-in counters of the Cassandra backend; but in
    addition to being inaccurate, they seem to be a bottleneck.
    
    This proxy will be a lightweight solution for counting object insertion,
    without needing to run Kafka on the test cluster.

commit 39c7212deb5b32d2486b39d1498b6636f3c86893
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 12:20:26 2021 +0200

    Update test

commit 459bc9d6656f3764120682218d87af73e881ec4b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 11:45:22 2021 +0200

    Fix in-mem

commit 6b27a722815e25c4f64ff3f137328728fbcb7518
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 11:08:15 2021 +0200

    cassandra: Add option to select (hopefully) more efficient batch insertion algos
    
    This adds a new config option for the cassandra backend,
    'directory_entries_insert_algo', with three possible values:
    
    * 'one-per-one' is the default, and preserves the current naive behavior
    * 'concurrent' and 'batch' are attempts at being more efficient

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1376/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1376/console

Harbormaster returned this revision to the author for changes because remote builds failed.Aug 27 2021, 11:33 AM
Harbormaster failed remote builds in B23211: Diff 22252!

Build is green

Patch application report for D6149 (id=22253)

Could not rebase; Attempt merge onto b110d1b69c...

Merge made by the 'recursive' strategy.
 requirements-swh.txt                |  1 +
 swh/storage/__init__.py             |  5 ++-
 swh/storage/cassandra/cql.py        | 88 ++++++++++++++++++++++++++++++++++---
 swh/storage/cassandra/storage.py    | 24 ++++++++--
 swh/storage/in_memory.py            |  1 +
 swh/storage/proxies/counter.py      | 66 ++++++++++++++++++++++++++++
 swh/storage/tests/test_cassandra.py |  7 +--
 swh/storage/tests/test_counter.py   | 63 ++++++++++++++++++++++++++
 8 files changed, 239 insertions(+), 16 deletions(-)
 create mode 100644 swh/storage/proxies/counter.py
 create mode 100644 swh/storage/tests/test_counter.py
Changes applied before test
commit 3f67bd62b7a45363aef6d80c608603b0a87c801b
Merge: b110d1b6 b10788d3
Author: Jenkins user <jenkins@localhost>
Date:   Fri Aug 27 09:44:11 2021 +0000

    Merge branch 'diff-target' into HEAD

commit b10788d3789fa1010d45ac57f79a16c8c3627502
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 27 11:32:03 2021 +0200

    Add counting storage proxy
    
    It will be used in the Cassandra experiment.
    
    Currently we use the built-in counters of the Cassandra backend; but in
    addition to being inaccurate, they seem to be a bottleneck.
    
    This proxy will be a lightweight solution for counting object insertion,
    without needing to run Kafka on the test cluster.

commit 39c7212deb5b32d2486b39d1498b6636f3c86893
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 12:20:26 2021 +0200

    Update test

commit 459bc9d6656f3764120682218d87af73e881ec4b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 11:45:22 2021 +0200

    Fix in-mem

commit 6b27a722815e25c4f64ff3f137328728fbcb7518
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 11:08:15 2021 +0200

    cassandra: Add option to select (hopefully) more efficient batch insertion algos
    
    This adds a new config option for the cassandra backend,
    'directory_entries_insert_algo', with three possible values:
    
    * 'one-per-one' is the default, and preserves the current naive behavior
    * 'concurrent' and 'batch' are attempts at being more efficient

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1377/ for more details.

ardumont added inline comments.
swh/storage/tests/test_counter.py
25

same below

This revision is now accepted and ready to land.Aug 27 2021, 12:25 PM

Build is green

Patch application report for D6149 (id=22265)

Rebasing onto b110d1b69c...

First, rewinding head to replay your work on top of it...
Applying: Add counting storage proxy
Changes applied before test
commit 2bf29b23ecdfad28345476337eec695aabf26c85
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 27 11:32:03 2021 +0200

    Add counting storage proxy
    
    It will be used in the Cassandra experiment.
    
    Currently we use the built-in counters of the Cassandra backend; but in
    addition to being inaccurate, they seem to be a bottleneck.
    
    This proxy will be a lightweight solution for counting object insertion,
    without needing to run Kafka on the test cluster.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1383/ for more details.

Build is green

Patch application report for D6149 (id=22269)

Rebasing onto b110d1b69c...

Current branch diff-target is up to date.
Changes applied before test
commit 47a6919fee499dd51fb0098099e895088a1a7c25
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 27 11:32:03 2021 +0200

    Add counting storage proxy
    
    It will be used in the Cassandra experiment.
    
    Currently we use the built-in counters of the Cassandra backend; but in
    addition to being inaccurate, they seem to be a bottleneck.
    
    This proxy will be a lightweight solution for counting object insertion,
    without needing to run Kafka on the test cluster.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1385/ for more details.

This revision was automatically updated to reflect the committed changes.