Page MenuHomeSoftware Heritage

cassandra: Make directory_ls fetch contents in batch instead of one-by-one
ClosedPublic

Authored by vlorentz on Sep 9 2021, 3:45 PM.

Details

Summary

This should make it run up to 100 times faster, even on average directories.

Depends on D6228.

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D6229 (id=22540)

Could not rebase; Attempt merge onto 50fb54f212...

Updating 50fb54f2..5facf661
Fast-forward
 swh/storage/cassandra/cql.py        | 20 +++++----
 swh/storage/cassandra/storage.py    | 81 ++++++++++++++++++++++++-------------
 swh/storage/in_memory.py            |  7 ++--
 swh/storage/tests/test_cassandra.py | 12 +++---
 4 files changed, 75 insertions(+), 45 deletions(-)
Changes applied before test
commit 5facf661bf70973bde09367706141990de77638d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 9 15:45:30 2021 +0200

    cassandra: Make directory_ls fetch contents in batch instead of one-by-one
    
    This should make it run up to 100 times faster, even on average directories.

commit 0570a426efa5c89f85b407b5c3c53d6e9615cd06
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 9 15:30:09 2021 +0200

    content_get: Fetch rows concurrently
    
    Instead of fetching them one-by-one, with the very high latency this entails.
    
    This is preliminary work to make `directory_ls` less painfully slow.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1401/ for more details.

olasd added a subscriber: olasd.

Good one, thanks.

This revision is now accepted and ready to land.Sep 10 2021, 4:51 PM