Page MenuHomeSoftware Heritage

cassandra: Add option to select (hopefully) more efficient batch insertion algos
ClosedPublic

Authored by vlorentz on Aug 26 2021, 11:10 AM.

Details

Summary

This adds a new config option for the cassandra backend,
'directory_entries_insert_algo', with three possible values:

  • 'one-per-one' is the default, and preserves the current naive behavior
  • 'concurrent' and 'batch' are attempts at being more efficient

Addresses the current performance bottleneck, according to T3493#68979

Diff Detail

Repository
rDSTO Storage manager
Branch
cassandra-dir-concurrent
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 23180
Build 36167: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 36166: arc lint + arc unit

Unit TestsFailed

TimeTest
37 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_api_client.TestStorageApi::test_directory_add
self = <swh.storage.tests.test_api_client.TestStorageApi object at 0x7fb96d105128> swh_storage = <RemoteStorage url=mock://example.com/> sample_data = <swh.storage.tests.storage_data.StorageData object at 0x7fb96d577fd0>
35 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_api_client.TestStorageApi::test_directory_add_twice
self = <swh.storage.tests.test_api_client.TestStorageApi object at 0x7fb96d807f60> swh_storage = <RemoteStorage url=mock://example.com/> sample_data = <swh.storage.tests.storage_data.StorageData object at 0x7fb96dadc400>
32 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_api_client.TestStorageApi::test_directory_entry_get_by_path
self = <swh.storage.tests.test_api_client.TestStorageApi object at 0x7fb96c545470> swh_storage = <RemoteStorage url=mock://example.com/> sample_data = <swh.storage.tests.storage_data.StorageData object at 0x7fb96c545518>
37 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_api_client.TestStorageApi::test_directory_get_entries[1]
self = <swh.storage.tests.test_api_client.TestStorageApi object at 0x7fb96d563fd0> swh_storage = <RemoteStorage url=mock://example.com/> sample_data = <swh.storage.tests.storage_data.StorageData object at 0x7fb96c510cc0>
35 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_api_client.TestStorageApi::test_directory_get_entries[2]
self = <swh.storage.tests.test_api_client.TestStorageApi object at 0x7fb96d3fd4a8> swh_storage = <RemoteStorage url=mock://example.com/> sample_data = <swh.storage.tests.storage_data.StorageData object at 0x7fb96d08e748>
View Full Test Results (65 Failed · 1,012 Passed · 39 Skipped)

Event Timeline

Build has FAILED

Patch application report for D6139 (id=22222)

Rebasing onto b110d1b69c...

First, rewinding head to replay your work on top of it...
Applying: cassandra: Add option to select (hopefully) more efficient batch insertion algos
Changes applied before test
commit cc8c2d10d4d1d36c0fdc0a3fa815162a6c74ed28
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 11:08:15 2021 +0200

    cassandra: Add option to select (hopefully) more efficient batch insertion algos
    
    This adds a new config option for the cassandra backend,
    'directory_entries_insert_algo', with three possible values:
    
    * 'one-per-one' is the default, and preserves the current naive behavior
    * 'concurrent' and 'batch' are attempts at being more efficient

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1373/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1373/console

Harbormaster returned this revision to the author for changes because remote builds failed.Aug 26 2021, 11:17 AM
Harbormaster failed remote builds in B23180: Diff 22222!

Thanks I will test this on g5k with the directory replayer and the git loader.
I will keep you in touch

Build has FAILED

Patch application report for D6139 (id=22223)

Rebasing onto b110d1b69c...

First, rewinding head to replay your work on top of it...
Applying: cassandra: Add option to select (hopefully) more efficient batch insertion algos
Applying: Fix in-mem
Changes applied before test
commit febe31d66e35063fe7330427bec648c2f9f8d070
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 11:45:22 2021 +0200

    Fix in-mem

commit 58070f87089745fbca3526e0b35dcc533c05caa9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 11:08:15 2021 +0200

    cassandra: Add option to select (hopefully) more efficient batch insertion algos
    
    This adds a new config option for the cassandra backend,
    'directory_entries_insert_algo', with three possible values:
    
    * 'one-per-one' is the default, and preserves the current naive behavior
    * 'concurrent' and 'batch' are attempts at being more efficient

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1374/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1374/console

Harbormaster returned this revision to the author for changes because remote builds failed.Aug 26 2021, 11:53 AM
Harbormaster failed remote builds in B23181: Diff 22223!

Build is green

Patch application report for D6139 (id=22232)

Rebasing onto b110d1b69c...

First, rewinding head to replay your work on top of it...
Applying: cassandra: Add option to select (hopefully) more efficient batch insertion algos
Applying: Fix in-mem
Applying: Update test
Changes applied before test
commit a9f95df65ddfe9085ca89eef2fd0841b806fc685
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 12:20:26 2021 +0200

    Update test

commit 749b46ae5a80b870b5ef090a8652203383ea9533
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 11:45:22 2021 +0200

    Fix in-mem

commit cd0898b85b7e3a537c13a2b9495a688e834b3ff7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 11:08:15 2021 +0200

    cassandra: Add option to select (hopefully) more efficient batch insertion algos
    
    This adds a new config option for the cassandra backend,
    'directory_entries_insert_algo', with three possible values:
    
    * 'one-per-one' is the default, and preserves the current naive behavior
    * 'concurrent' and 'batch' are attempts at being more efficient

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1375/ for more details.

The patch was test in a loader and in the replayers.
The difference was not really significant on the loader but I'm not really confident in the tests as the cluster had a pretty high load (running replayers + second datacenter synchronization).
I will retry with a more quieter environment to be able to isolate the loader behavior.

It was different with the directory replayers. With the concurrent and batch option, the performance seems to be really better:

The first plateau is with the unpatched version.
The peak is with the concurrent parameter
The problem is it seems there is a lock contention on the counter and all the replayers stopped with this error:

Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: Traceback (most recent call last):
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "/usr/bin/swh", line 11, in <module>
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:     load_entry_point('swh.core==0.14.4', 'console_scripts', 'swh')()
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "/usr/lib/python3/dist-packages/swh/core/cli/__init__.py", line 185, in main
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:     return swh(auto_envvar_prefix="SWH")
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "/usr/lib/python3/dist-packages/click/core.py", line 764, in __call__
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:     return self.main(*args, **kwargs)
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "/usr/lib/python3/dist-packages/click/core.py", line 717, in main
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:     rv = self.invoke(ctx)
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "/usr/lib/python3/dist-packages/click/core.py", line 1137, in invoke
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:     return _process_result(sub_ctx.command.invoke(sub_ctx))
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "/usr/lib/python3/dist-packages/click/core.py", line 1137, in invoke
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:     return _process_result(sub_ctx.command.invoke(sub_ctx))
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "/usr/lib/python3/dist-packages/click/core.py", line 956, in invoke
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:     return ctx.invoke(self.callback, **ctx.params)
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "/usr/lib/python3/dist-packages/click/core.py", line 555, in invoke
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:     return callback(*args, **kwargs)
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "/usr/lib/python3/dist-packages/click/decorators.py", line 17, in new_func
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:     return f(get_current_context(), *args, **kwargs)
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "/usr/lib/python3/dist-packages/swh/storage/cli.py", line 194, in replay
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:     client.process(worker_fn)
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "/usr/lib/python3/dist-packages/swh/journal/client.py", line 265, in process
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:     batch_processed, at_eof = self.handle_messages(messages, worker_fn)
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "/usr/lib/python3/dist-packages/swh/journal/client.py", line 292, in handle_messages
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:     worker_fn(dict(objects))
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "/usr/lib/python3/dist-packages/swh/storage/replay.py", line 62, in process_replay_objects
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:     _insert_objects(object_type, objects, storage)
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "/usr/lib/python3/dist-packages/swh/storage/replay.py", line 144, in _insert_objects
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:     storage.origin_add(origins)
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "/usr/lib/python3/dist-packages/swh/storage/cassandra/storage.py", line 1021, in origin_add
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:     OriginRow(sha1=hash_url(origin.url), url=origin.url, next_visit_id=1)
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "/usr/lib/python3/dist-packages/swh/storage/cassandra/cql.py", line 162, in newf
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:     self, *args, **kwargs, statement=self._prepared_statements[f.__name__]
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "/usr/lib/python3/dist-packages/swh/storage/cassandra/cql.py", line 913, in origin_add_one
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:     self._add_one(statement, origin)
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "/usr/lib/python3/dist-packages/swh/storage/cassandra/cql.py", line 315, in _add_one
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:     self._increment_counter(obj.TABLE, 1)
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "/usr/lib/python3/dist-packages/swh/storage/cassandra/cql.py", line 162, in newf
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:     self, *args, **kwargs, statement=self._prepared_statements[f.__name__]
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "/usr/lib/python3/dist-packages/swh/storage/cassandra/cql.py", line 312, in _increment_counter                                                                                                                                                             
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:     self._execute_with_retries(statement, [nb, object_type])
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "/usr/lib/python3/dist-packages/tenacity/__init__.py", line 329, in wrapped_f
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:     return self.call(f, *args, **kw)
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "/usr/lib/python3/dist-packages/tenacity/__init__.py", line 409, in call
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:     do = self.iter(retry_state=retry_state)
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "/usr/lib/python3/dist-packages/tenacity/__init__.py", line 356, in iter
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:     return fut.result()
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "/usr/lib/python3.7/concurrent/futures/_base.py", line 425, in result
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:     return self.__get_result()
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:     raise self._exception
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "/usr/lib/python3/dist-packages/tenacity/__init__.py", line 412, in call
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:     result = fn(*args, **kwargs)
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "/usr/lib/python3/dist-packages/swh/storage/cassandra/cql.py", line 293, in _execute_with_retries                                                                                                                                                          
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:     return self._session.execute(statement, args, timeout=1000.0)
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "cassandra/cluster.py", line 2345, in cassandra.cluster.Session.execute
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]:   File "cassandra/cluster.py", line 4304, in cassandra.cluster.ResponseFuture.result
Aug 26 18:36:59 paravance-70.rennes.grid5000.fr swh[9885]: cassandra.WriteTimeout: Error from server: code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'consistency': 'ONE', 'required_responses': 1, 'received_responses': 0, 'write_type': 'COUNTER'}

Tested with 40 and 10 directory replayers

rebase + squash + add tests

Build is green

Patch application report for D6139 (id=22276)

Could not rebase; Attempt merge onto 47a6919fee...

Updating 47a6919f..d272e70b
Fast-forward
 swh/storage/cassandra/cql.py        | 105 ++++++++++++++++++++++++++++--------
 swh/storage/cassandra/schema.py     |   8 ---
 swh/storage/cassandra/storage.py    |  32 +++++++++--
 swh/storage/in_memory.py            |   1 +
 swh/storage/tests/storage_tests.py  |  67 ++++++++++++++++-------
 swh/storage/tests/test_cassandra.py |  31 ++++++++---
 6 files changed, 187 insertions(+), 57 deletions(-)
Changes applied before test
commit d272e70b9c2a2ceec484ffb7511eb6e742255319
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 11:08:15 2021 +0200

    cassandra: Add option to select (hopefully) more efficient batch insertion algos
    
    This adds a new config option for the cassandra backend,
    'directory_entries_insert_algo', with three possible values:
    
    * 'one-per-one' is the default, and preserves the current naive behavior
    * 'concurrent' and 'batch' are attempts at being more efficient

commit c8a1ed7ac4adb7ca4f0bda9b12ace14b2fc521ce
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 27 11:45:18 2021 +0200

    cassandra: Remove stat_counters.
    
    They were inaccurate and a performance bottleneck.
    
    We can/should use swh-counters instead, now.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1387/ for more details.

thanks. I will test that once the monitoring is updated to use the statsd statistics instead of the object_count table content.

Build is green

Patch application report for D6139 (id=22328)

Could not rebase; Attempt merge onto 3ad1bec113...

Updating 3ad1bec1..df045d45
Fast-forward
 swh/storage/cassandra/cql.py        | 105 ++++++++++++++++++++++++++++--------
 swh/storage/cassandra/schema.py     |   8 ---
 swh/storage/cassandra/storage.py    |  32 +++++++++--
 swh/storage/in_memory.py            |   1 +
 swh/storage/tests/storage_tests.py  |  67 ++++++++++++++++-------
 swh/storage/tests/test_cassandra.py |  31 ++++++++---
 6 files changed, 187 insertions(+), 57 deletions(-)
Changes applied before test
commit df045d45cb90d9ba7f1fb87cdf20945ac42fa800
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 11:08:15 2021 +0200

    cassandra: Add option to select (hopefully) more efficient batch insertion algos
    
    This adds a new config option for the cassandra backend,
    'directory_entries_insert_algo', with three possible values:
    
    * 'one-per-one' is the default, and preserves the current naive behavior
    * 'concurrent' and 'batch' are attempts at being more efficient

commit e8aad0fffc5341a78f98cb3b4c4845b6f4e95527
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 27 11:45:18 2021 +0200

    cassandra: Remove stat_counters.
    
    They were inaccurate and a performance bottleneck.
    
    We can/should use swh-counters instead, now.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1394/ for more details.

Test with 10 replayers with the 3 kind of algorithm:

  • first interval: one-by-one
  • second interval: concurremt
  • third interval: batch:

The stats are based on the directory_add statsd metrics

concurrent and batch seem to have almost the same performances. Both are faster than the one-by-one algo but batch looks more stable.

The concurrent algo have a higher impact on the cpu consumption.

I will test with more replayers in parallel to confirm the tendency

Some feedback, I had to delay the benchmarks because the servers was almost full and the cluster needed to be expanded to 7 nodes. The cluster is in stabilization phase (rebuild/repair of the new node and cleanup of the old one)
When it will be done, I will be able to finalize the tests Hopefully at the beginning of the next week

These are more results with different number of replayers.
Each line represents a server with 20 directory replayers, the renages are for one-by-one, concurrent, batch

  • 1 node

  • 2 nodes

  • 3 nodes

  • 4 nodes

  • 5 nodes

For 4 nodes, and only for 4 nodes, the batch version collapse and is doing nothing visible and nothing is logged.
There is some communication with cassandra but nothing is ingested.

After analyze, the batch mode uses huge prepared statements (PS). It seems with the concurrency of the replayers, they are evicted from the PS cache before they can be executed. The client silently reprepare the statement and retry fall in a prepared/evicted loop.
Some interesting read:
https://docs.datastax.com/en/developer/java-driver/3.0/manual/statements/prepared/#preparing-on-multiple-nodes

finally, if the driver tries to execute a statement and finds out that the coordinator doesn’t know about it, it will re-prepare the statement on the fly (this is transparent for the client, but will cost two extra roundtrips):

From a tcpdump:

...
Prepared query with ID a6a4d5fc60c93bd33e1df45d36a045b5 not found (either the query was not prepared on this host (maybe the host has been restarted?) or you have prepared too many queries and it has been evicted from the internal cache)
BEGIN UNLOGGED BATCH
Prepared query with ID a6a4d5fc60c93bd33e1df45d36a045b5 not found (either the query was not prepared on this host (maybe the host has been restarted?) or you have prepared too many queries and it has been evicted from the internal cache)
BEGIN UNLOGGED BATCH
Prepared query with ID a6a4d5fc60c93bd33e1df45d36a045b5 not found (either the query was not prepared on this host (maybe the host has been restarted?) or you have prepared too many queries and it has been evicted from the internal cache)
BEGIN UNLOGGED BATCH
Prepared query with ID a6a4d5fc60c93bd33e1df45d36a045b5 not found (either the query was not prepared on this host (maybe the host has been restarted?) or you have prepared too many queries and it has been evicted from the internal cache)
BEGIN UNLOGGED BATCH
Prepared query with ID a6a4d5fc60c93bd33e1df45d36a045b5 not found (either the query was not prepared on this host (maybe the host has been restarted?) or you have prepared too many queries and it has been evicted from the internal cache)
....

This example if for a big PS with 904

INSERT INTO directory_entry (directory_id, name, target, perms, type) VALUES (?, ?, ?, ?, ?);

In cassandra, there are a lot of logs like this one:

WARN  [ScheduledTasks:1] 2021-09-07 07:45:41,884 QueryProcessor.java:101 - 576 prepared statements discarded in the last minute because cache limit reached (63 MB)

According to the documentation of the cassandra concurrent api[1], it seems the concurrency can by specified as an argument of the execute_concurrent_with_args method. The default is 100, but it could be interesting to check with higher or lower values

The concurrency parameter controls how many statements will be executed concurrently. When Cluster.protocol_version is set to 1 or 2, it is recommended that this be kept below 100 times the number of core connections per host times the number of connected hosts (see Cluster.set_core_connections_per_host()). If that amount is exceeded, the event loop thread may attempt to block on new connection creation, substantially impacting throughput. If protocol_version is 3 or higher, you can safely experiment with higher levels of concurrency.

[1] https://docs.datastax.com/en/developer/python-driver/3.25/api/cassandra/concurrent/

I will probably dedicate a benchmark session for the concurrent algo with different concurrencies

After analyze, the batch mode uses huge prepared statements (PS). It seems with the concurrency of the replayers, they are evicted from the PS cache before they can be executed. The client silently reprepare the statement and retry fall in a prepared/evicted loop.

Ooooh, so that's what it is! Good to know.

Can you try with lower values for BATCH_INSERT_MAX_SIZE? (eg. 100)

  • rebase
  • directory_entry_add_batch: Reduce churn of prepared statements

Build is green

Patch application report for D6139 (id=22479)

Could not rebase; Attempt merge onto 834a49d067...

Updating 834a49d0..da7e63ea
Fast-forward
 swh/storage/cassandra/cql.py              | 86 ++++++++++++++++++++++++++++---
 swh/storage/cassandra/storage.py          | 32 ++++++++++--
 swh/storage/in_memory.py                  |  1 +
 swh/storage/migrate_extrinsic_metadata.py | 23 +++++++--
 swh/storage/tests/storage_tests.py        |  2 +-
 swh/storage/tests/test_cassandra.py       | 55 +++++++++++++++++---
 6 files changed, 178 insertions(+), 21 deletions(-)
Changes applied before test
commit da7e63ea683c5dad0da64e83615a7df24156c6a2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 8 11:56:49 2021 +0200

    directory_entry_add_batch: Reduce churn of prepared statements
    
    By reusing the 'steady state' main statement (which is quite large)
    across calls.

commit fc950deb20b4a2ea1f40a2617f51e3eb73f78612
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 11:08:15 2021 +0200

    cassandra: Add option to select (hopefully) more efficient batch insertion algos
    
    This adds a new config option for the cassandra backend,
    'directory_entries_insert_algo', with three possible values:
    
    * 'one-per-one' is the default, and preserves the current naive behavior
    * 'concurrent' and 'batch' are attempts at being more efficient

commit 7dc2863ef1fa0701e0cf632c0ce68db8e2965817
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Sep 6 14:45:40 2021 +0200

    migrate_extrinsic_metadata: Add an option to limit the number of revisions
    
    This will be used as a second pass on objects that failed with
    older versions of the script.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1397/ for more details.

Thanks for the last fix, it looks better with a smaller batch size:
5 nodes:


The ingestion is ~7500 ops/s in batch compared to ~6500 before

6 nodes:


It seems the maximum rate we can afford is ~ 7500 ops

4 nodes:
4 nodes run still looks problematics, 3 runs were launched, 2 succeed and 1 failed.


I can't find anything indicating a particular load on the servers that can explain this behavior.
There is no particular cpu memory or I/O load, the compaction jobs looks stable, no client timeout.
The only difference is the PS eviction behavior

  • directory_entry_add_batch: Remove the temporary prepared statement entirely

Build is green

Patch application report for D6139 (id=22516)

Could not rebase; Attempt merge onto 834a49d067...

Updating 834a49d0..50fb54f2
Fast-forward
 swh/storage/cassandra/cql.py              | 80 ++++++++++++++++++++++++++++---
 swh/storage/cassandra/storage.py          | 32 +++++++++++--
 swh/storage/in_memory.py                  |  1 +
 swh/storage/migrate_extrinsic_metadata.py | 23 +++++++--
 swh/storage/tests/storage_tests.py        |  2 +-
 swh/storage/tests/test_cassandra.py       | 55 ++++++++++++++++++---
 6 files changed, 172 insertions(+), 21 deletions(-)
Changes applied before test
commit 50fb54f212bc36c1d28d80749cb435bbddaf2db8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 9 11:35:22 2021 +0200

    directory_entry_add_batch: Remove the temporary prepared statement entirely
    
    And fall back to concurrent insertion.

commit da7e63ea683c5dad0da64e83615a7df24156c6a2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Sep 8 11:56:49 2021 +0200

    directory_entry_add_batch: Reduce churn of prepared statements
    
    By reusing the 'steady state' main statement (which is quite large)
    across calls.

commit fc950deb20b4a2ea1f40a2617f51e3eb73f78612
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Aug 26 11:08:15 2021 +0200

    cassandra: Add option to select (hopefully) more efficient batch insertion algos
    
    This adds a new config option for the cassandra backend,
    'directory_entries_insert_algo', with three possible values:
    
    * 'one-per-one' is the default, and preserves the current naive behavior
    * 'concurrent' and 'batch' are attempts at being more efficient

commit 7dc2863ef1fa0701e0cf632c0ce68db8e2965817
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Sep 6 14:45:40 2021 +0200

    migrate_extrinsic_metadata: Add an option to limit the number of revisions
    
    This will be used as a second pass on objects that failed with
    older versions of the script.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1398/ for more details.

This revision was not accepted when it landed; it landed in state Needs Review.Sep 9 2021, 3:27 PM
This revision was not accepted when it landed; it landed in state Needs Review.
This revision was automatically updated to reflect the committed changes.

final tests with the last version, everything looks good with almost the same performances a better ingestion rate in batch:
4 nodes before (batch only):

4 nodes after:

5 nodes before:
(don't have the measure yet due to the reservation expiration)

5 nodes after:

The performance is a little better for all the tests because there is one more server on the cluster.