Differential D6885

cassandra: Use concurrent queries in *_missing() instead of naive grouping
ClosedPublic
Actions

Authored by vlorentz on Jan 6 2022, 12:44 PM.

Details

Reviewers

ardumont

Group Reviewers

Reviewers

Maniphest Tasks

T3577: Parallel loaders performances

Commits

rDSTO4a24505049d5: cassandra: Use concurrent queries in *_missing() instead of naive grouping

Summary

Instead of grouping ids in queries in arbitrary batches (which forces
the server node to coordinate with other nodes to complete the query),
this sends queries with one id each, directly to the right node.

This is the 'concurrent' algorithm from https://forge.softwareheritage.org/T3577#72791
which gives a >=2x speed-up on directories, and a >=8x speed-up on revisions.

This is essentially D6423, minus the option to select other algos.

Diff Detail

Repository

rDSTO Storage manager

Branch

concurrent-missing

Lint

No Linters Available

Unit

No Unit Test Coverage

Build Status

Buildable 25838
Build 40384: Phabricator diff pipeline on jenkins	Jenkins console · Jenkins
Build 40383: arc lint + arc unit

Event Timeline

vlorentz created this revision.Jan 6 2022, 12:44 PM

Herald added a reviewer: Reviewers. · View Herald TranscriptJan 6 2022, 12:44 PM

Build is green

Patch application report for D6885 (id=24967)

Rebasing onto 259bf6fe1e...

Current branch diff-target is up to date.

Changes applied before test

commit 4a24505049d5c34c264d2b27e5feb24719b9e674
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jan 6 12:41:45 2022 +0100

    cassandra: Use concurrent queries in *_missing() instead of naive grouping
    
    Instead of grouping ids in queries in arbitrary batches (which forces
    the server node to coordinate with other nodes to complete the query),
    this sends queries with one id each, directly to the right node.
    
    This is the 'concurrent' algorithm from https://forge.softwareheritage.org/T3577#72791
    which gives a >=2x speed-up on directories, and a >=8x speed-up on revisions.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1517/ for more details.

Harbormaster completed remote builds in B25838: Diff 24967.Jan 6 2022, 12:51 PM

vlorentz requested review of this revision.Jan 6 2022, 12:51 PM

ardumont accepted this revision.Jan 6 2022, 4:01 PM

This revision is now accepted and ready to land.Jan 6 2022, 4:01 PM

Closed by commit rDSTO4a24505049d5: cassandra: Use concurrent queries in *_missing() instead of naive grouping (authored by vlorentz). · Explain WhyJan 6 2022, 5:29 PM

This revision was automatically updated to reflect the committed changes.

vlorentz added a commit: rDSTO4a24505049d5: cassandra: Use concurrent queries in *_missing() instead of naive grouping.

vlorentz mentioned this in D6888: cassandra: Rewrite content_missing to run queries concurrently..Jan 6 2022, 5:31 PM

vlorentz added a task: T3577: Parallel loaders performances .Jan 6 2022, 5:32 PM

vlorentz mentioned this in D6423: cassandra: Add alternative algorithms to list missing objects.Feb 4 2022, 3:11 PM