Page MenuHomeSoftware Heritage

content_get: Fetch rows concurrently
ClosedPublic

Authored by vlorentz on Sep 9 2021, 3:31 PM.

Details

Summary

Instead of fetching them one-by-one, with the very high latency this entails.

This is preliminary work to make directory_ls less painfully slow.

Diff Detail

Repository
rDSTO Storage manager
Branch
cassandra-perfs
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 23483
Build 36636: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 36635: arc lint + arc unit

Unit TestsFailed

TimeTest
52 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_api_client.TestStorageApi::test_content_add
self = <swh.storage.tests.test_api_client.TestStorageApi object at 0x7f6df0389748> swh_storage = <RemoteStorage url=mock://example.com/> sample_data = <swh.storage.tests.storage_data.StorageData object at 0x7f6df090ad68>
57 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_api_client.TestStorageApi::test_content_add_collision
self = <swh.storage.tests.test_api_client.TestStorageApi object at 0x7f6def836a90> swh_storage = <RemoteStorage url=mock://example.com/> sample_data = <swh.storage.tests.storage_data.StorageData object at 0x7f6df08df550>
39 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_api_client.TestStorageApi::test_content_add_different_input
self = <swh.storage.tests.test_api_client.TestStorageApi object at 0x7f6df086c9b0> swh_storage = <RemoteStorage url=mock://example.com/> sample_data = <swh.storage.tests.storage_data.StorageData object at 0x7f6df0389198>
41 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_api_client.TestStorageApi::test_content_add_duplicate
self = <swh.storage.tests.test_api_client.TestStorageApi object at 0x7f6def836ac8> swh_storage = <RemoteStorage url=mock://example.com/> sample_data = <swh.storage.tests.storage_data.StorageData object at 0x7f6def8362b0>
52 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_api_client.TestStorageApi::test_content_add_from_lazy_content
self = <swh.storage.tests.test_api_client.TestStorageApi object at 0x7f6defe13e10> swh_storage = <RemoteStorage url=mock://example.com/> sample_data = <swh.storage.tests.storage_data.StorageData object at 0x7f6df01d4cc0>
View Full Test Results (149 Failed · 939 Passed · 40 Skipped)

Event Timeline

Build has FAILED

Patch application report for D6228 (id=22538)

Rebasing onto 50fb54f212...

Current branch diff-target is up to date.
Changes applied before test
commit 680d12149bb4e6f2c6c824c4f2684f80cadba38f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 9 15:30:09 2021 +0200

    content_get: Fetch rows concurrently
    
    Instead of fetching them one-by-one, with the very high latency this entails.
    
    This is preliminary work to make `directory_ls` less painfully slow.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1399/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1399/console

Harbormaster returned this revision to the author for changes because remote builds failed.Sep 9 2021, 3:38 PM
Harbormaster failed remote builds in B23483: Diff 22538!

Build is green

Patch application report for D6228 (id=22539)

Rebasing onto 50fb54f212...

Current branch diff-target is up to date.
Changes applied before test
commit 0570a426efa5c89f85b407b5c3c53d6e9615cd06
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 9 15:30:09 2021 +0200

    content_get: Fetch rows concurrently
    
    Instead of fetching them one-by-one, with the very high latency this entails.
    
    This is preliminary work to make `directory_ls` less painfully slow.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1400/ for more details.

olasd added a subscriber: olasd.

Conditionally accepted on whether the inconsistency between _execute_with_retries and _execute_many_with_retries is worth it.

swh/storage/cassandra/cql.py
292–293

Shouldn't this be turned into a generator for consistency? Or the other method kept eager? Is there a point where we do _execute_many_with_retries but don't want to keep all results?

312–315

Use a list comprehension?

313–314

*ugh* generators.

This revision is now accepted and ready to land.Sep 10 2021, 4:48 PM
swh/storage/cassandra/cql.py
292–293

Shouldn't this be turned into a generator for consistency? Or the other method kept eager?

ResultSet is lazy. https://docs.datastax.com/en/developer/python-driver/3.24/api/cassandra/cluster/#cassandra.cluster.ResultSet

Is there a point where we do _execute_many_with_retries but don't want to keep all results?

In _add_many.

313–314

ikr... as usual it took me some time to figure it out

This revision was automatically updated to reflect the committed changes.