Page MenuHomeSoftware Heritage

cassandra: Rewrite content_missing to run queries concurrently.
ClosedPublic

Authored by vlorentz on Jan 6 2022, 5:31 PM.

Details

Summary

This is twice as fast, according to
https://forge.softwareheritage.org/T3577#72791

This is the same commit as D6495, rebased on D6885 instead of D6423.

Diff Detail

Repository
rDSTO Storage manager
Branch
concurrent-missing
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 25861
Build 40413: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 40412: arc lint + arc unit

Event Timeline

Build has FAILED

Patch application report for D6888 (id=24984)

Rebasing onto 4a24505049...

Current branch diff-target is up to date.
Changes applied before test
commit 2a38ccc4d43fefcc0652da780dd76ad403468323
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Oct 18 13:25:20 2021 +0200

    cassandra: Rewrite content_missing to run queries concurrently.
    
    This is twice as fast, according to
    https://forge.softwareheritage.org/T3577#72791

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1518/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1518/console

Harbormaster returned this revision to the author for changes because remote builds failed.Jan 6 2022, 5:39 PM
Harbormaster failed remote builds in B25855: Diff 24984!

Build is green

Patch application report for D6888 (id=24991)

Rebasing onto 4a24505049...

Current branch diff-target is up to date.
Changes applied before test
commit 55141ff2d57ca147efc2235eba2b006814c03817
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Oct 18 13:25:20 2021 +0200

    cassandra: Rewrite content_missing to run queries concurrently.
    
    This is twice as fast, according to
    https://forge.softwareheritage.org/T3577#72791

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1519/ for more details.

anlambert added a subscriber: anlambert.

Looks good to me.

This revision is now accepted and ready to land.Jan 12 2022, 11:27 AM
douardda added a subscriber: douardda.

fine for me (but plz give a bit more insight)

swh/storage/cassandra/storage.py
415

would be nice to have a comment explaining why this more convoluted code is better (aka remind the reader the concurrency gained with the usage of content_find_many)

419

not a big fan of the double for loop, but meh (alternative implem would probably be much worse)

swh/storage/cassandra/storage.py
419

the alternative implem is in D6889

Build is green

Patch application report for D6888 (id=25080)

Rebasing onto 4a24505049...

Current branch diff-target is up to date.
Changes applied before test
commit d5f1f0ec055477461a000f7eeaece974fa1265b1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Oct 18 13:25:20 2021 +0200

    cassandra: Rewrite content_missing to run queries concurrently.
    
    This is twice as fast, according to
    https://forge.softwareheritage.org/T3577#72791

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1522/ for more details.