Diffusion Storage manager 4a24505049d5

cassandra: Use concurrent queries in *_missing() instead of naive grouping
4a24505049d5
Actions

Tags

None

Subscribers

None

Description

cassandra: Use concurrent queries in *_missing() instead of naive grouping

Instead of grouping ids in queries in arbitrary batches (which forces
the server node to coordinate with other nodes to complete the query),
this sends queries with one id each, directly to the right node.

This is the 'concurrent' algorithm from https://forge.softwareheritage.org/T3577#72791
which gives a >=2x speed-up on directories, and a >=8x speed-up on revisions.

Details

Provenance

vlorentz	Authored on Jan 6 2022, 12:41 PM
vlorentz	Pushed on Jan 6 2022, 5:29 PM

Differential Revision

D6885: cassandra: Use concurrent queries in *_missing() instead of naive grouping

Parents

rDSTO259bf6fe1e3b: Improve documentation of the replay command

Branches

Unknown

Tags

Unknown

Build Status

Buildable 25854
Build 40404: test-and-build	Jenkins console · Jenkins

Event Timeline

vlorentz committed rDSTO4a24505049d5: cassandra: Use concurrent queries in *_missing() instead of naive grouping (authored by vlorentz).Jan 6 2022, 12:43 PM

vlorentz added an edge: D6885: cassandra: Use concurrent queries in *_missing() instead of naive grouping.Jan 6 2022, 5:29 PM

Harbormaster failed to build B25854: rDSTO4a24505049d5: cassandra: Use concurrent queries in *_missing() instead of naive grouping!Jan 6 2022, 5:37 PM

swh-public-ci mentioned this in D6888: cassandra: Rewrite content_missing to run queries concurrently..Jan 6 2022, 5:39 PM

swh-public-ci mentioned this in D6889: cassandra: Make content_missing run in linear time instead of quadratic.Jan 7 2022, 1:12 PM

swh-public-ci mentioned this in D6922: cassandra: Clarify use of '<=' on set-like dict views.Jan 12 2022, 11:38 AM

Changes (1)

Path

Size

swh/

storage/

cassandra/

rDSTO4a24505049d5

swh/storage/cassandra/cql.py

Loading...