Page MenuHomeSoftware Heritage

content_get: Fetch rows concurrently
ClosedPublic

Authored by vlorentz on Sep 9 2021, 3:31 PM.

Details

Summary

Instead of fetching them one-by-one, with the very high latency this entails.

This is preliminary work to make directory_ls less painfully slow.

Diff Detail

Repository
rDSTO Storage manager
Branch
cassandra-perfs
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 23484
Build 36638: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 36637: arc lint + arc unit

Event Timeline

Build has FAILED

Patch application report for D6228 (id=22538)

Rebasing onto 50fb54f212...

Current branch diff-target is up to date.
Changes applied before test
commit 680d12149bb4e6f2c6c824c4f2684f80cadba38f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 9 15:30:09 2021 +0200

    content_get: Fetch rows concurrently
    
    Instead of fetching them one-by-one, with the very high latency this entails.
    
    This is preliminary work to make `directory_ls` less painfully slow.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1399/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1399/console

Harbormaster returned this revision to the author for changes because remote builds failed.Sep 9 2021, 3:38 PM
Harbormaster failed remote builds in B23483: Diff 22538!

Build is green

Patch application report for D6228 (id=22539)

Rebasing onto 50fb54f212...

Current branch diff-target is up to date.
Changes applied before test
commit 0570a426efa5c89f85b407b5c3c53d6e9615cd06
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 9 15:30:09 2021 +0200

    content_get: Fetch rows concurrently
    
    Instead of fetching them one-by-one, with the very high latency this entails.
    
    This is preliminary work to make `directory_ls` less painfully slow.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1400/ for more details.

olasd added a subscriber: olasd.

Conditionally accepted on whether the inconsistency between _execute_with_retries and _execute_many_with_retries is worth it.

swh/storage/cassandra/cql.py
292–293

Shouldn't this be turned into a generator for consistency? Or the other method kept eager? Is there a point where we do _execute_many_with_retries but don't want to keep all results?

312

Use a list comprehension?

314–315

*ugh* generators.

This revision is now accepted and ready to land.Sep 10 2021, 4:48 PM
swh/storage/cassandra/cql.py
292–293

Shouldn't this be turned into a generator for consistency? Or the other method kept eager?

ResultSet is lazy. https://docs.datastax.com/en/developer/python-driver/3.24/api/cassandra/cluster/#cassandra.cluster.ResultSet

Is there a point where we do _execute_many_with_retries but don't want to keep all results?

In _add_many.

314–315

ikr... as usual it took me some time to figure it out

This revision was automatically updated to reflect the committed changes.