Page MenuHomeSoftware Heritage

Add endpoint directory_get_entries, to quickly list a directory's entries
ClosedPublic

Authored by vlorentz on May 10 2021, 4:12 PM.

Details

Summary

It spares a join with the content table, which should hopefully make
the vault (and possibly other users) faster when they don't need this
join.

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D5723 (id=20446)

Could not rebase; Attempt merge onto 761709957a...

Updating 76170995..474a94c4
Fast-forward
 swh/storage/cassandra/cql.py         | 11 +++++
 swh/storage/cassandra/storage.py     | 24 +++++++++++
 swh/storage/in_memory.py             | 11 +++++
 swh/storage/interface.py             | 28 +++++++++++++
 swh/storage/postgresql/db.py         | 11 ++++-
 swh/storage/postgresql/storage.py    | 27 ++++++++++++
 swh/storage/retry.py                 |  3 +-
 swh/storage/sql/40-funcs.sql         | 28 +++++++++++++
 swh/storage/tests/storage_tests.py   | 38 +++++++++++++++++
 swh/storage/tests/test_cassandra.py  | 80 +++++++++++++++++++++++++++++++++++-
 swh/storage/tests/test_postgresql.py | 10 ++++-
 11 files changed, 267 insertions(+), 4 deletions(-)
Changes applied before test
commit 474a94c40b3a944ce832e05c3a5444b6b8b987f4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 10 16:12:05 2021 +0200

    Add endpoint directory_get_entries, to quickly list a directory's entries
    
    It spares a join with the content table, which should hopefully make
    the vault (and possibly other users) faster when they don't need this
    join.

commit 4d3eeb2edd5b1413a968a30b1b0f585be4dcf4e0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 10 14:13:20 2021 +0200

    cassandra: Add tests checking directory_add and snapshot_add are atomic.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1320/ for more details.

Build is green

Patch application report for D5723 (id=20461)

Could not rebase; Attempt merge onto b487a21f27...

Merge made by the 'recursive' strategy.
 swh/storage/cassandra/cql.py         | 11 +++++
 swh/storage/cassandra/storage.py     | 24 +++++++++++
 swh/storage/in_memory.py             | 11 +++++
 swh/storage/interface.py             | 28 +++++++++++++
 swh/storage/postgresql/db.py         | 11 ++++-
 swh/storage/postgresql/storage.py    | 26 ++++++++++++
 swh/storage/proxies/retry.py         |  3 +-
 swh/storage/sql/40-funcs.sql         | 28 +++++++++++++
 swh/storage/tests/storage_tests.py   | 38 +++++++++++++++++
 swh/storage/tests/test_cassandra.py  | 80 +++++++++++++++++++++++++++++++++++-
 swh/storage/tests/test_postgresql.py | 10 ++++-
 11 files changed, 266 insertions(+), 4 deletions(-)
Changes applied before test
commit e8fb21269afdbe84db50236c940e37b15be9c47c
Merge: b487a21f a6a28319
Author: Jenkins user <jenkins@localhost>
Date:   Mon May 10 18:57:11 2021 +0000

    Merge branch 'diff-target' into HEAD

commit a6a283195782b7f8d2c33b24e8328b1cbbdd599b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 10 16:12:05 2021 +0200

    Add endpoint directory_get_entries, to quickly list a directory's entries
    
    It spares a join with the content table, which should hopefully make
    the vault (and possibly other users) faster when they don't need this
    join.

commit 4d3eeb2edd5b1413a968a30b1b0f585be4dcf4e0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 10 14:13:20 2021 +0200

    cassandra: Add tests checking directory_add and snapshot_add are atomic.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1321/ for more details.

olasd added a subscriber: olasd.
olasd added inline comments.
swh/storage/interface.py
445–448

copy and waste of the wrong docs here :)

swh/storage/retry.py
43–44 ↗(On Diff #20461)

Yes! But please put this in a separate commit (no need for review)

swh/storage/tests/storage_tests.py
912

Maybe deserves a _pagination suffix to make the intent clearer?

934

And this one would get the suffix dropped.

938–947

Don't we already have a helper function to do that? :)

This revision is now accepted and ready to land.May 11 2021, 11:46 AM
swh/storage/retry.py
43–44 ↗(On Diff #20461)

oops

swh/storage/interface.py
445–448

nah, copied from the right docs, but I just forgot to run the regexp on it

445–448

hmm but I did change the interface to page_token after writing the doc tho

apply comments:

  • update docstring
  • remove change to retry proxy
  • rename tests
  • dedup code using stream_results

Build is green

Patch application report for D5723 (id=20490)

Rebasing onto f140f634b6...

Current branch diff-target is up to date.
Changes applied before test
commit e3cbd5ee425cefa1e290a34cd889256036a06db0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 10 16:12:05 2021 +0200

    Add endpoint directory_get_entries, to quickly list a directory's entries
    
    It spares a join with the content table, which should hopefully make
    the vault (and possibly other users) faster when they don't need this
    join.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1324/ for more details.