They were inaccurate and a performance bottleneck.
We can/should use swh-counters instead, now.
Depends on D6149.
Differential D6150
cassandra: Remove stat_counters. vlorentz on Aug 27 2021, 11:45 AM. Authored by
Details
They were inaccurate and a performance bottleneck. We can/should use swh-counters instead, now. Depends on D6149.
Diff Detail
Event TimelineComment Actions Build has FAILED Patch application report for D6150 (id=22254)Could not rebase; Attempt merge onto b110d1b69c... Merge made by the 'recursive' strategy. requirements-swh.txt | 1 + swh/storage/__init__.py | 5 +- swh/storage/cassandra/cql.py | 101 ++++++++++++++++++++++++++++-------- swh/storage/cassandra/model.py | 11 ---- swh/storage/cassandra/schema.py | 8 --- swh/storage/cassandra/storage.py | 39 ++++++++------ swh/storage/in_memory.py | 23 +------- swh/storage/proxies/counter.py | 66 +++++++++++++++++++++++ swh/storage/tests/storage_tests.py | 48 ++++++++++------- swh/storage/tests/test_cassandra.py | 7 +-- swh/storage/tests/test_counter.py | 63 ++++++++++++++++++++++ 11 files changed, 266 insertions(+), 106 deletions(-) create mode 100644 swh/storage/proxies/counter.py create mode 100644 swh/storage/tests/test_counter.py Changes applied before testcommit f80282152f7fd9820462c40bb8ebcbda83adc907 Merge: b110d1b6 9cc88a4c Author: Jenkins user <jenkins@localhost> Date: Fri Aug 27 09:50:46 2021 +0000 Merge branch 'diff-target' into HEAD commit 9cc88a4cbad5f070d320d47c1933078835f0b49f Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 27 11:45:18 2021 +0200 cassandra: Remove stat_counters. They were inaccurate and a performance bottleneck. We can/should use swh-counters instead, now. commit b10788d3789fa1010d45ac57f79a16c8c3627502 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 27 11:32:03 2021 +0200 Add counting storage proxy It will be used in the Cassandra experiment. Currently we use the built-in counters of the Cassandra backend; but in addition to being inaccurate, they seem to be a bottleneck. This proxy will be a lightweight solution for counting object insertion, without needing to run Kafka on the test cluster. commit 39c7212deb5b32d2486b39d1498b6636f3c86893 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 12:20:26 2021 +0200 Update test commit 459bc9d6656f3764120682218d87af73e881ec4b Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 11:45:22 2021 +0200 Fix in-mem commit 6b27a722815e25c4f64ff3f137328728fbcb7518 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 11:08:15 2021 +0200 cassandra: Add option to select (hopefully) more efficient batch insertion algos This adds a new config option for the cassandra backend, 'directory_entries_insert_algo', with three possible values: * 'one-per-one' is the default, and preserves the current naive behavior * 'concurrent' and 'batch' are attempts at being more efficient Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1378/ Comment Actions Build has FAILED Patch application report for D6150 (id=22256)Rebasing onto b110d1b69c... First, rewinding head to replay your work on top of it... Applying: cassandra: Add option to select (hopefully) more efficient batch insertion algos Applying: Fix in-mem Applying: Update test Applying: Add counting storage proxy Applying: cassandra: Remove stat_counters. Changes applied before testcommit 7736ba3c0da5792f912b113a5247e85d1474f9b1 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 27 11:45:18 2021 +0200 cassandra: Remove stat_counters. They were inaccurate and a performance bottleneck. We can/should use swh-counters instead, now. commit eeeac6332b42a1b93f4b189bdd14838c4eae4fa1 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 27 11:32:03 2021 +0200 Add counting storage proxy It will be used in the Cassandra experiment. Currently we use the built-in counters of the Cassandra backend; but in addition to being inaccurate, they seem to be a bottleneck. This proxy will be a lightweight solution for counting object insertion, without needing to run Kafka on the test cluster. commit ee3ff9c187a17226814195e493f39eaa059675cd Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 12:20:26 2021 +0200 Update test commit 7626e1d1ef78863b7de4fac99803fb84e90f8274 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 11:45:22 2021 +0200 Fix in-mem commit 9c8d086010edfff649f15b873a762742e7047325 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 11:08:15 2021 +0200 cassandra: Add option to select (hopefully) more efficient batch insertion algos This adds a new config option for the cassandra backend, 'directory_entries_insert_algo', with three possible values: * 'one-per-one' is the default, and preserves the current naive behavior * 'concurrent' and 'batch' are attempts at being more efficient Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1379/ Comment Actions Build has FAILED Patch application report for D6150 (id=22263)Rebasing onto b110d1b69c... First, rewinding head to replay your work on top of it... Applying: cassandra: Add option to select (hopefully) more efficient batch insertion algos Applying: Fix in-mem Applying: Update test Applying: Add counting storage proxy Applying: cassandra: Remove stat_counters. Changes applied before testcommit 278c7652ce960e7f9f7d85d70812b2b61778f4a7 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 27 11:45:18 2021 +0200 cassandra: Remove stat_counters. They were inaccurate and a performance bottleneck. We can/should use swh-counters instead, now. commit db5411c7962916880d2f9b7497ce0fb7c4038d01 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 27 11:32:03 2021 +0200 Add counting storage proxy It will be used in the Cassandra experiment. Currently we use the built-in counters of the Cassandra backend; but in addition to being inaccurate, they seem to be a bottleneck. This proxy will be a lightweight solution for counting object insertion, without needing to run Kafka on the test cluster. commit 199e49bbb841f6faeee923c7bbef29a3bf7e86b5 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 12:20:26 2021 +0200 Update test commit 1f3523b86b8bf860daeb0542ab25f68b1f766309 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 11:45:22 2021 +0200 Fix in-mem commit ff5e10d66232fad8e247c8afaa74ba0d64917cda Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 11:08:15 2021 +0200 cassandra: Add option to select (hopefully) more efficient batch insertion algos This adds a new config option for the cassandra backend, 'directory_entries_insert_algo', with three possible values: * 'one-per-one' is the default, and preserves the current naive behavior * 'concurrent' and 'batch' are attempts at being more efficient Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1380/ Comment Actions Build has FAILED Patch application report for D6150 (id=22264)Could not rebase; Attempt merge onto b110d1b69c... Merge made by the 'recursive' strategy. requirements-swh.txt | 1 + swh/storage/__init__.py | 5 +- swh/storage/cassandra/cql.py | 105 ++++++++++++++++++++++++++++-------- swh/storage/cassandra/schema.py | 8 --- swh/storage/cassandra/storage.py | 24 +++++++-- swh/storage/in_memory.py | 1 + swh/storage/proxies/counter.py | 66 +++++++++++++++++++++++ swh/storage/tests/storage_tests.py | 67 ++++++++++++++++------- swh/storage/tests/test_cassandra.py | 7 +-- swh/storage/tests/test_counter.py | 63 ++++++++++++++++++++++ 10 files changed, 289 insertions(+), 58 deletions(-) create mode 100644 swh/storage/proxies/counter.py create mode 100644 swh/storage/tests/test_counter.py Changes applied before testcommit 72e778f16e2a86240233e04da4369ea60441b7ec Merge: b110d1b6 0f4deb11 Author: Jenkins user <jenkins@localhost> Date: Fri Aug 27 10:45:36 2021 +0000 Merge branch 'diff-target' into HEAD commit 0f4deb1134673bb76eaa020c18da2e7c74e62cc4 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 27 11:45:18 2021 +0200 cassandra: Remove stat_counters. They were inaccurate and a performance bottleneck. We can/should use swh-counters instead, now. commit b10788d3789fa1010d45ac57f79a16c8c3627502 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 27 11:32:03 2021 +0200 Add counting storage proxy It will be used in the Cassandra experiment. Currently we use the built-in counters of the Cassandra backend; but in addition to being inaccurate, they seem to be a bottleneck. This proxy will be a lightweight solution for counting object insertion, without needing to run Kafka on the test cluster. commit 39c7212deb5b32d2486b39d1498b6636f3c86893 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 12:20:26 2021 +0200 Update test commit 459bc9d6656f3764120682218d87af73e881ec4b Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 11:45:22 2021 +0200 Fix in-mem commit 6b27a722815e25c4f64ff3f137328728fbcb7518 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 11:08:15 2021 +0200 cassandra: Add option to select (hopefully) more efficient batch insertion algos This adds a new config option for the cassandra backend, 'directory_entries_insert_algo', with three possible values: * 'one-per-one' is the default, and preserves the current naive behavior * 'concurrent' and 'batch' are attempts at being more efficient Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1381/ Comment Actions Build is green Patch application report for D6150 (id=22264)Could not rebase; Attempt merge onto b110d1b69c... Merge made by the 'recursive' strategy. requirements-swh.txt | 1 + swh/storage/__init__.py | 5 +- swh/storage/cassandra/cql.py | 105 ++++++++++++++++++++++++++++-------- swh/storage/cassandra/schema.py | 8 --- swh/storage/cassandra/storage.py | 24 +++++++-- swh/storage/in_memory.py | 1 + swh/storage/proxies/counter.py | 66 +++++++++++++++++++++++ swh/storage/tests/storage_tests.py | 67 ++++++++++++++++------- swh/storage/tests/test_cassandra.py | 7 +-- swh/storage/tests/test_counter.py | 63 ++++++++++++++++++++++ 10 files changed, 289 insertions(+), 58 deletions(-) create mode 100644 swh/storage/proxies/counter.py create mode 100644 swh/storage/tests/test_counter.py Changes applied before testcommit 54c025b8f05e66de4f6968aee628aafb04487057 Merge: b110d1b6 0f4deb11 Author: Jenkins user <jenkins@localhost> Date: Fri Aug 27 11:15:38 2021 +0000 Merge branch 'diff-target' into HEAD commit 0f4deb1134673bb76eaa020c18da2e7c74e62cc4 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 27 11:45:18 2021 +0200 cassandra: Remove stat_counters. They were inaccurate and a performance bottleneck. We can/should use swh-counters instead, now. commit b10788d3789fa1010d45ac57f79a16c8c3627502 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 27 11:32:03 2021 +0200 Add counting storage proxy It will be used in the Cassandra experiment. Currently we use the built-in counters of the Cassandra backend; but in addition to being inaccurate, they seem to be a bottleneck. This proxy will be a lightweight solution for counting object insertion, without needing to run Kafka on the test cluster. commit 39c7212deb5b32d2486b39d1498b6636f3c86893 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 12:20:26 2021 +0200 Update test commit 459bc9d6656f3764120682218d87af73e881ec4b Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 11:45:22 2021 +0200 Fix in-mem commit 6b27a722815e25c4f64ff3f137328728fbcb7518 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Aug 26 11:08:15 2021 +0200 cassandra: Add option to select (hopefully) more efficient batch insertion algos This adds a new config option for the cassandra backend, 'directory_entries_insert_algo', with three possible values: * 'one-per-one' is the default, and preserves the current naive behavior * 'concurrent' and 'batch' are attempts at being more efficient See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1382/ for more details. Comment Actions Build is green Patch application report for D6150 (id=22266)Could not rebase; Attempt merge onto b110d1b69c... Merge made by the 'recursive' strategy. requirements-swh.txt | 1 + swh/storage/__init__.py | 5 +-- swh/storage/cassandra/cql.py | 22 +++---------- swh/storage/cassandra/schema.py | 8 ----- swh/storage/proxies/counter.py | 66 +++++++++++++++++++++++++++++++++++++ swh/storage/tests/storage_tests.py | 67 +++++++++++++++++++++++++++----------- swh/storage/tests/test_counter.py | 63 +++++++++++++++++++++++++++++++++++ 7 files changed, 186 insertions(+), 46 deletions(-) create mode 100644 swh/storage/proxies/counter.py create mode 100644 swh/storage/tests/test_counter.py Changes applied before testcommit 7bd210eb11c007e52b9901cb61a2e24d2f14736e Merge: b110d1b6 bf2d9b1b Author: Jenkins user <jenkins@localhost> Date: Fri Aug 27 11:28:40 2021 +0000 Merge branch 'diff-target' into HEAD commit bf2d9b1b68cee1058e3e8d71dd73d4aab550d9ed Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 27 11:45:18 2021 +0200 cassandra: Remove stat_counters. They were inaccurate and a performance bottleneck. We can/should use swh-counters instead, now. commit 2c3a9d0f34aa942cdfeaac5522220e5ce430eef0 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 27 11:32:03 2021 +0200 Add counting storage proxy It will be used in the Cassandra experiment. Currently we use the built-in counters of the Cassandra backend; but in addition to being inaccurate, they seem to be a bottleneck. This proxy will be a lightweight solution for counting object insertion, without needing to run Kafka on the test cluster. See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1384/ for more details. Comment Actions Build is green Patch application report for D6150 (id=22270)Could not rebase; Attempt merge onto b110d1b69c... Updating b110d1b6..c8a1ed7a Fast-forward requirements-swh.txt | 1 + swh/storage/__init__.py | 5 +-- swh/storage/cassandra/cql.py | 22 +++---------- swh/storage/cassandra/schema.py | 8 ----- swh/storage/proxies/counter.py | 66 +++++++++++++++++++++++++++++++++++++ swh/storage/tests/storage_tests.py | 67 +++++++++++++++++++++++++++----------- swh/storage/tests/test_counter.py | 63 +++++++++++++++++++++++++++++++++++ 7 files changed, 186 insertions(+), 46 deletions(-) create mode 100644 swh/storage/proxies/counter.py create mode 100644 swh/storage/tests/test_counter.py Changes applied before testcommit c8a1ed7ac4adb7ca4f0bda9b12ace14b2fc521ce Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 27 11:45:18 2021 +0200 cassandra: Remove stat_counters. They were inaccurate and a performance bottleneck. We can/should use swh-counters instead, now. commit 47a6919fee499dd51fb0098099e895088a1a7c25 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 27 11:32:03 2021 +0200 Add counting storage proxy It will be used in the Cassandra experiment. Currently we use the built-in counters of the Cassandra backend; but in addition to being inaccurate, they seem to be a bottleneck. This proxy will be a lightweight solution for counting object insertion, without needing to run Kafka on the test cluster. See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1386/ for more details. Comment Actions Build is green Patch application report for D6150 (id=22327)Rebasing onto 3ad1bec113... Current branch diff-target is up to date. Changes applied before testcommit e8aad0fffc5341a78f98cb3b4c4845b6f4e95527 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 27 11:45:18 2021 +0200 cassandra: Remove stat_counters. They were inaccurate and a performance bottleneck. We can/should use swh-counters instead, now. See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1393/ for more details. |