Page MenuHomeSoftware Heritage

cassandra: Make directory_ls fetch contents in batch instead of one-by-one
ClosedPublic

Authored by vlorentz on Sep 9 2021, 3:45 PM.

Details

Summary

This should make it run up to 100 times faster, even on average directories.

Depends on D6228.

Diff Detail

Repository
rDSTO Storage manager
Branch
cassandra-perfs
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 23485
Build 36640: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 36639: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D6229 (id=22540)

Could not rebase; Attempt merge onto 50fb54f212...

Updating 50fb54f2..5facf661
Fast-forward
 swh/storage/cassandra/cql.py        | 20 +++++----
 swh/storage/cassandra/storage.py    | 81 ++++++++++++++++++++++++-------------
 swh/storage/in_memory.py            |  7 ++--
 swh/storage/tests/test_cassandra.py | 12 +++---
 4 files changed, 75 insertions(+), 45 deletions(-)
Changes applied before test
commit 5facf661bf70973bde09367706141990de77638d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 9 15:45:30 2021 +0200

    cassandra: Make directory_ls fetch contents in batch instead of one-by-one
    
    This should make it run up to 100 times faster, even on average directories.

commit 0570a426efa5c89f85b407b5c3c53d6e9615cd06
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 9 15:30:09 2021 +0200

    content_get: Fetch rows concurrently
    
    Instead of fetching them one-by-one, with the very high latency this entails.
    
    This is preliminary work to make `directory_ls` less painfully slow.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1401/ for more details.

olasd added a subscriber: olasd.

Good one, thanks.

This revision is now accepted and ready to land.Sep 10 2021, 4:51 PM