Page MenuHomeSoftware Heritage

cassandra: Remove stat_counters.
ClosedPublic

Authored by vlorentz on Aug 27 2021, 11:45 AM.

Details

Summary

They were inaccurate and a performance bottleneck.

We can/should use swh-counters instead, now.

Depends on D6149.

Diff Detail

Repository
rDSTO Storage manager
Branch
counting-proxy
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 23220
Build 36229: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 36228: arc lint + arc unit

Event Timeline

Build has FAILED

Patch application report for D6150 (id=22254)

Could not rebase; Attempt merge onto b110d1b69c...

Merge made by the 'recursive' strategy.
 requirements-swh.txt                |   1 +
 swh/storage/__init__.py             |   5 +-
 swh/storage/cassandra/cql.py        | 101 ++++++++++++++++++++++++++++--------
 swh/storage/cassandra/model.py      |  11 ----
 swh/storage/cassandra/schema.py     |   8 ---
 swh/storage/cassandra/storage.py    |  39 ++++++++------
 swh/storage/in_memory.py            |  23 +-------
 swh/storage/proxies/counter.py      |  66 +++++++++++++++++++++++
 swh/storage/tests/storage_tests.py  |  48 ++++++++++-------
 swh/storage/tests/test_cassandra.py |   7 +--
 swh/storage/tests/test_counter.py   |  63 ++++++++++++++++++++++
 11 files changed, 266 insertions(+), 106 deletions(-)
 create mode 100644 swh/storage/proxies/counter.py
 create mode 100644 swh/storage/tests/test_counter.py
Changes applied before test
commit f80282152f7fd9820462c40bb8ebcbda83adc907
Merge: b110d1b6 9cc88a4c
Author: Jenkins user <jenkins@localhost>
Date:   Fri Aug 27 09:50:46 2021 +0000

    Merge branch 'diff-target' into HEAD

commit 9cc88a4cbad5f070d320d47c1933078835f0b49f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 27 11:45:18 2021 +0200

    cassandra: Remove stat_counters.
    
    They were inaccurate and a performance bottleneck.
    
    We can/should use swh-counters instead, now.

commit b10788d3789fa1010d45ac57f79a16c8c3627502
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 27 11:32:03 2021 +0200

    Add counting storage proxy
    
    It will be used in the Cassandra experiment.
    
    Currently we use the built-in counters of the Cassandra backend; but in
    addition to being inaccurate, they seem to be a bottleneck.
    
    This proxy will be a lightweight solution for counting object insertion,
    without needing to run Kafka on the test cluster.

commit 39c7212deb5b32d2486b39d1498b6636f3c86893
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 12:20:26 2021 +0200

    Update test

commit 459bc9d6656f3764120682218d87af73e881ec4b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 11:45:22 2021 +0200

    Fix in-mem

commit 6b27a722815e25c4f64ff3f137328728fbcb7518
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 11:08:15 2021 +0200

    cassandra: Add option to select (hopefully) more efficient batch insertion algos
    
    This adds a new config option for the cassandra backend,
    'directory_entries_insert_algo', with three possible values:
    
    * 'one-per-one' is the default, and preserves the current naive behavior
    * 'concurrent' and 'batch' are attempts at being more efficient

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1378/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1378/console

Harbormaster returned this revision to the author for changes because remote builds failed.Aug 27 2021, 11:57 AM
Harbormaster failed remote builds in B23213: Diff 22254!

Build has FAILED

Patch application report for D6150 (id=22256)

Rebasing onto b110d1b69c...

First, rewinding head to replay your work on top of it...
Applying: cassandra: Add option to select (hopefully) more efficient batch insertion algos
Applying: Fix in-mem
Applying: Update test
Applying: Add counting storage proxy
Applying: cassandra: Remove stat_counters.
Changes applied before test
commit 7736ba3c0da5792f912b113a5247e85d1474f9b1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 27 11:45:18 2021 +0200

    cassandra: Remove stat_counters.
    
    They were inaccurate and a performance bottleneck.
    
    We can/should use swh-counters instead, now.

commit eeeac6332b42a1b93f4b189bdd14838c4eae4fa1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 27 11:32:03 2021 +0200

    Add counting storage proxy
    
    It will be used in the Cassandra experiment.
    
    Currently we use the built-in counters of the Cassandra backend; but in
    addition to being inaccurate, they seem to be a bottleneck.
    
    This proxy will be a lightweight solution for counting object insertion,
    without needing to run Kafka on the test cluster.

commit ee3ff9c187a17226814195e493f39eaa059675cd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 12:20:26 2021 +0200

    Update test

commit 7626e1d1ef78863b7de4fac99803fb84e90f8274
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 11:45:22 2021 +0200

    Fix in-mem

commit 9c8d086010edfff649f15b873a762742e7047325
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 11:08:15 2021 +0200

    cassandra: Add option to select (hopefully) more efficient batch insertion algos
    
    This adds a new config option for the cassandra backend,
    'directory_entries_insert_algo', with three possible values:
    
    * 'one-per-one' is the default, and preserves the current naive behavior
    * 'concurrent' and 'batch' are attempts at being more efficient

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1379/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1379/console

Harbormaster returned this revision to the author for changes because remote builds failed.Aug 27 2021, 12:11 PM
Harbormaster failed remote builds in B23215: Diff 22256!

re-enable for in-mem, proxy tests need it

Build has FAILED

Patch application report for D6150 (id=22263)

Rebasing onto b110d1b69c...

First, rewinding head to replay your work on top of it...
Applying: cassandra: Add option to select (hopefully) more efficient batch insertion algos
Applying: Fix in-mem
Applying: Update test
Applying: Add counting storage proxy
Applying: cassandra: Remove stat_counters.
Changes applied before test
commit 278c7652ce960e7f9f7d85d70812b2b61778f4a7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 27 11:45:18 2021 +0200

    cassandra: Remove stat_counters.
    
    They were inaccurate and a performance bottleneck.
    
    We can/should use swh-counters instead, now.

commit db5411c7962916880d2f9b7497ce0fb7c4038d01
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 27 11:32:03 2021 +0200

    Add counting storage proxy
    
    It will be used in the Cassandra experiment.
    
    Currently we use the built-in counters of the Cassandra backend; but in
    addition to being inaccurate, they seem to be a bottleneck.
    
    This proxy will be a lightweight solution for counting object insertion,
    without needing to run Kafka on the test cluster.

commit 199e49bbb841f6faeee923c7bbef29a3bf7e86b5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 12:20:26 2021 +0200

    Update test

commit 1f3523b86b8bf860daeb0542ab25f68b1f766309
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 11:45:22 2021 +0200

    Fix in-mem

commit ff5e10d66232fad8e247c8afaa74ba0d64917cda
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 11:08:15 2021 +0200

    cassandra: Add option to select (hopefully) more efficient batch insertion algos
    
    This adds a new config option for the cassandra backend,
    'directory_entries_insert_algo', with three possible values:
    
    * 'one-per-one' is the default, and preserves the current naive behavior
    * 'concurrent' and 'batch' are attempts at being more efficient

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1380/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1380/console

Harbormaster returned this revision to the author for changes because remote builds failed.Aug 27 2021, 12:45 PM
Harbormaster failed remote builds in B23220: Diff 22263!

Build has FAILED

Patch application report for D6150 (id=22264)

Could not rebase; Attempt merge onto b110d1b69c...

Merge made by the 'recursive' strategy.
 requirements-swh.txt                |   1 +
 swh/storage/__init__.py             |   5 +-
 swh/storage/cassandra/cql.py        | 105 ++++++++++++++++++++++++++++--------
 swh/storage/cassandra/schema.py     |   8 ---
 swh/storage/cassandra/storage.py    |  24 +++++++--
 swh/storage/in_memory.py            |   1 +
 swh/storage/proxies/counter.py      |  66 +++++++++++++++++++++++
 swh/storage/tests/storage_tests.py  |  67 ++++++++++++++++-------
 swh/storage/tests/test_cassandra.py |   7 +--
 swh/storage/tests/test_counter.py   |  63 ++++++++++++++++++++++
 10 files changed, 289 insertions(+), 58 deletions(-)
 create mode 100644 swh/storage/proxies/counter.py
 create mode 100644 swh/storage/tests/test_counter.py
Changes applied before test
commit 72e778f16e2a86240233e04da4369ea60441b7ec
Merge: b110d1b6 0f4deb11
Author: Jenkins user <jenkins@localhost>
Date:   Fri Aug 27 10:45:36 2021 +0000

    Merge branch 'diff-target' into HEAD

commit 0f4deb1134673bb76eaa020c18da2e7c74e62cc4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 27 11:45:18 2021 +0200

    cassandra: Remove stat_counters.
    
    They were inaccurate and a performance bottleneck.
    
    We can/should use swh-counters instead, now.

commit b10788d3789fa1010d45ac57f79a16c8c3627502
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 27 11:32:03 2021 +0200

    Add counting storage proxy
    
    It will be used in the Cassandra experiment.
    
    Currently we use the built-in counters of the Cassandra backend; but in
    addition to being inaccurate, they seem to be a bottleneck.
    
    This proxy will be a lightweight solution for counting object insertion,
    without needing to run Kafka on the test cluster.

commit 39c7212deb5b32d2486b39d1498b6636f3c86893
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 12:20:26 2021 +0200

    Update test

commit 459bc9d6656f3764120682218d87af73e881ec4b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 11:45:22 2021 +0200

    Fix in-mem

commit 6b27a722815e25c4f64ff3f137328728fbcb7518
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 11:08:15 2021 +0200

    cassandra: Add option to select (hopefully) more efficient batch insertion algos
    
    This adds a new config option for the cassandra backend,
    'directory_entries_insert_algo', with three possible values:
    
    * 'one-per-one' is the default, and preserves the current naive behavior
    * 'concurrent' and 'batch' are attempts at being more efficient

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1381/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1381/console

Harbormaster returned this revision to the author for changes because remote builds failed.Aug 27 2021, 12:45 PM
Harbormaster failed remote builds in B23221: Diff 22264!

Build is green

Patch application report for D6150 (id=22264)

Could not rebase; Attempt merge onto b110d1b69c...

Merge made by the 'recursive' strategy.
 requirements-swh.txt                |   1 +
 swh/storage/__init__.py             |   5 +-
 swh/storage/cassandra/cql.py        | 105 ++++++++++++++++++++++++++++--------
 swh/storage/cassandra/schema.py     |   8 ---
 swh/storage/cassandra/storage.py    |  24 +++++++--
 swh/storage/in_memory.py            |   1 +
 swh/storage/proxies/counter.py      |  66 +++++++++++++++++++++++
 swh/storage/tests/storage_tests.py  |  67 ++++++++++++++++-------
 swh/storage/tests/test_cassandra.py |   7 +--
 swh/storage/tests/test_counter.py   |  63 ++++++++++++++++++++++
 10 files changed, 289 insertions(+), 58 deletions(-)
 create mode 100644 swh/storage/proxies/counter.py
 create mode 100644 swh/storage/tests/test_counter.py
Changes applied before test
commit 54c025b8f05e66de4f6968aee628aafb04487057
Merge: b110d1b6 0f4deb11
Author: Jenkins user <jenkins@localhost>
Date:   Fri Aug 27 11:15:38 2021 +0000

    Merge branch 'diff-target' into HEAD

commit 0f4deb1134673bb76eaa020c18da2e7c74e62cc4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 27 11:45:18 2021 +0200

    cassandra: Remove stat_counters.
    
    They were inaccurate and a performance bottleneck.
    
    We can/should use swh-counters instead, now.

commit b10788d3789fa1010d45ac57f79a16c8c3627502
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 27 11:32:03 2021 +0200

    Add counting storage proxy
    
    It will be used in the Cassandra experiment.
    
    Currently we use the built-in counters of the Cassandra backend; but in
    addition to being inaccurate, they seem to be a bottleneck.
    
    This proxy will be a lightweight solution for counting object insertion,
    without needing to run Kafka on the test cluster.

commit 39c7212deb5b32d2486b39d1498b6636f3c86893
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 12:20:26 2021 +0200

    Update test

commit 459bc9d6656f3764120682218d87af73e881ec4b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 11:45:22 2021 +0200

    Fix in-mem

commit 6b27a722815e25c4f64ff3f137328728fbcb7518
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 11:08:15 2021 +0200

    cassandra: Add option to select (hopefully) more efficient batch insertion algos
    
    This adds a new config option for the cassandra backend,
    'directory_entries_insert_algo', with three possible values:
    
    * 'one-per-one' is the default, and preserves the current naive behavior
    * 'concurrent' and 'batch' are attempts at being more efficient

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1382/ for more details.

Build is green

Patch application report for D6150 (id=22266)

Could not rebase; Attempt merge onto b110d1b69c...

Merge made by the 'recursive' strategy.
 requirements-swh.txt               |  1 +
 swh/storage/__init__.py            |  5 +--
 swh/storage/cassandra/cql.py       | 22 +++----------
 swh/storage/cassandra/schema.py    |  8 -----
 swh/storage/proxies/counter.py     | 66 +++++++++++++++++++++++++++++++++++++
 swh/storage/tests/storage_tests.py | 67 +++++++++++++++++++++++++++-----------
 swh/storage/tests/test_counter.py  | 63 +++++++++++++++++++++++++++++++++++
 7 files changed, 186 insertions(+), 46 deletions(-)
 create mode 100644 swh/storage/proxies/counter.py
 create mode 100644 swh/storage/tests/test_counter.py
Changes applied before test
commit 7bd210eb11c007e52b9901cb61a2e24d2f14736e
Merge: b110d1b6 bf2d9b1b
Author: Jenkins user <jenkins@localhost>
Date:   Fri Aug 27 11:28:40 2021 +0000

    Merge branch 'diff-target' into HEAD

commit bf2d9b1b68cee1058e3e8d71dd73d4aab550d9ed
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 27 11:45:18 2021 +0200

    cassandra: Remove stat_counters.
    
    They were inaccurate and a performance bottleneck.
    
    We can/should use swh-counters instead, now.

commit 2c3a9d0f34aa942cdfeaac5522220e5ce430eef0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 27 11:32:03 2021 +0200

    Add counting storage proxy
    
    It will be used in the Cassandra experiment.
    
    Currently we use the built-in counters of the Cassandra backend; but in
    addition to being inaccurate, they seem to be a bottleneck.
    
    This proxy will be a lightweight solution for counting object insertion,
    without needing to run Kafka on the test cluster.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1384/ for more details.

Build is green

Patch application report for D6150 (id=22270)

Could not rebase; Attempt merge onto b110d1b69c...

Updating b110d1b6..c8a1ed7a
Fast-forward
 requirements-swh.txt               |  1 +
 swh/storage/__init__.py            |  5 +--
 swh/storage/cassandra/cql.py       | 22 +++----------
 swh/storage/cassandra/schema.py    |  8 -----
 swh/storage/proxies/counter.py     | 66 +++++++++++++++++++++++++++++++++++++
 swh/storage/tests/storage_tests.py | 67 +++++++++++++++++++++++++++-----------
 swh/storage/tests/test_counter.py  | 63 +++++++++++++++++++++++++++++++++++
 7 files changed, 186 insertions(+), 46 deletions(-)
 create mode 100644 swh/storage/proxies/counter.py
 create mode 100644 swh/storage/tests/test_counter.py
Changes applied before test
commit c8a1ed7ac4adb7ca4f0bda9b12ace14b2fc521ce
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 27 11:45:18 2021 +0200

    cassandra: Remove stat_counters.
    
    They were inaccurate and a performance bottleneck.
    
    We can/should use swh-counters instead, now.

commit 47a6919fee499dd51fb0098099e895088a1a7c25
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 27 11:32:03 2021 +0200

    Add counting storage proxy
    
    It will be used in the Cassandra experiment.
    
    Currently we use the built-in counters of the Cassandra backend; but in
    addition to being inaccurate, they seem to be a bottleneck.
    
    This proxy will be a lightweight solution for counting object insertion,
    without needing to run Kafka on the test cluster.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1386/ for more details.

ardumont added a subscriber: ardumont.

ok

Can you please add back the task it's related to?

TIA

This revision is now accepted and ready to land.Aug 30 2021, 4:52 PM

Build is green

Patch application report for D6150 (id=22327)

Rebasing onto 3ad1bec113...

Current branch diff-target is up to date.
Changes applied before test
commit e8aad0fffc5341a78f98cb3b4c4845b6f4e95527
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 27 11:45:18 2021 +0200

    cassandra: Remove stat_counters.
    
    They were inaccurate and a performance bottleneck.
    
    We can/should use swh-counters instead, now.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1393/ for more details.

This revision was automatically updated to reflect the committed changes.