Diffusion Staging repository 4a24505049d5

cassandra: Use concurrent queries in *_missing() instead of naive grouping
4a24505049d5Unpublished
Actions

Unpublished Commit · Learn More

Not On Permanent Ref: This commit is not an ancestor of any permanent ref.

Description

cassandra: Use concurrent queries in *_missing() instead of naive grouping

Instead of grouping ids in queries in arbitrary batches (which forces
the server node to coordinate with other nodes to complete the query),
this sends queries with one id each, directly to the right node.

This is the 'concurrent' algorithm from https://forge.softwareheritage.org/T3577#72791
which gives a >=2x speed-up on directories, and a >=8x speed-up on revisions.

Details

Provenance

vlorentz	Authored on Jan 6 2022, 12:41 PM
vlorentz	Pushed on Jan 6 2022, 12:44 PM

Parents

R65:259bf6fe1e3b: Improve documentation of the replay command

Branches

Unknown

Tags

Unknown

References

tag: phabricator/diff/24967, tag: phabricator/base/25080, tag: phabricator/base/24991, tag: phabricator/base/24984

Event Timeline

Changes (1)

Path

Size

swh/

storage/

cassandra/

cql.py

R65:4a24505049d5

View Options

swh/storage/cassandra/cql.py

cassandra: Use concurrent queries in *_missing() instead of naive grouping4a24505049d5UnpublishedActions