Page MenuHomeSoftware Heritage

Add endpoint directory_get_entries, to quickly list a directory's entries
ClosedPublic

Authored by vlorentz on May 10 2021, 4:12 PM.

Details

Summary

It spares a join with the content table, which should hopefully make
the vault (and possibly other users) faster when they don't need this
join.

Diff Detail

Repository
rDSTO Storage manager
Branch
dir-get-entries
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 21423
Build 33279: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 33278: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D5723 (id=20446)

Could not rebase; Attempt merge onto 761709957a...

Updating 76170995..474a94c4
Fast-forward
 swh/storage/cassandra/cql.py         | 11 +++++
 swh/storage/cassandra/storage.py     | 24 +++++++++++
 swh/storage/in_memory.py             | 11 +++++
 swh/storage/interface.py             | 28 +++++++++++++
 swh/storage/postgresql/db.py         | 11 ++++-
 swh/storage/postgresql/storage.py    | 27 ++++++++++++
 swh/storage/retry.py                 |  3 +-
 swh/storage/sql/40-funcs.sql         | 28 +++++++++++++
 swh/storage/tests/storage_tests.py   | 38 +++++++++++++++++
 swh/storage/tests/test_cassandra.py  | 80 +++++++++++++++++++++++++++++++++++-
 swh/storage/tests/test_postgresql.py | 10 ++++-
 11 files changed, 267 insertions(+), 4 deletions(-)
Changes applied before test
commit 474a94c40b3a944ce832e05c3a5444b6b8b987f4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 10 16:12:05 2021 +0200

    Add endpoint directory_get_entries, to quickly list a directory's entries
    
    It spares a join with the content table, which should hopefully make
    the vault (and possibly other users) faster when they don't need this
    join.

commit 4d3eeb2edd5b1413a968a30b1b0f585be4dcf4e0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 10 14:13:20 2021 +0200

    cassandra: Add tests checking directory_add and snapshot_add are atomic.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1320/ for more details.

Build is green

Patch application report for D5723 (id=20461)

Could not rebase; Attempt merge onto b487a21f27...

Merge made by the 'recursive' strategy.
 swh/storage/cassandra/cql.py         | 11 +++++
 swh/storage/cassandra/storage.py     | 24 +++++++++++
 swh/storage/in_memory.py             | 11 +++++
 swh/storage/interface.py             | 28 +++++++++++++
 swh/storage/postgresql/db.py         | 11 ++++-
 swh/storage/postgresql/storage.py    | 26 ++++++++++++
 swh/storage/proxies/retry.py         |  3 +-
 swh/storage/sql/40-funcs.sql         | 28 +++++++++++++
 swh/storage/tests/storage_tests.py   | 38 +++++++++++++++++
 swh/storage/tests/test_cassandra.py  | 80 +++++++++++++++++++++++++++++++++++-
 swh/storage/tests/test_postgresql.py | 10 ++++-
 11 files changed, 266 insertions(+), 4 deletions(-)
Changes applied before test
commit e8fb21269afdbe84db50236c940e37b15be9c47c
Merge: b487a21f a6a28319
Author: Jenkins user <jenkins@localhost>
Date:   Mon May 10 18:57:11 2021 +0000

    Merge branch 'diff-target' into HEAD

commit a6a283195782b7f8d2c33b24e8328b1cbbdd599b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 10 16:12:05 2021 +0200

    Add endpoint directory_get_entries, to quickly list a directory's entries
    
    It spares a join with the content table, which should hopefully make
    the vault (and possibly other users) faster when they don't need this
    join.

commit 4d3eeb2edd5b1413a968a30b1b0f585be4dcf4e0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 10 14:13:20 2021 +0200

    cassandra: Add tests checking directory_add and snapshot_add are atomic.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1321/ for more details.

olasd added a subscriber: olasd.
olasd added inline comments.
swh/storage/interface.py
445–448

copy and waste of the wrong docs here :)

swh/storage/retry.py
43–45

Yes! But please put this in a separate commit (no need for review)

swh/storage/tests/storage_tests.py
911

Maybe deserves a _pagination suffix to make the intent clearer?

933

And this one would get the suffix dropped.

937–946

Don't we already have a helper function to do that? :)

This revision is now accepted and ready to land.May 11 2021, 11:46 AM
swh/storage/retry.py
43–45

oops

swh/storage/interface.py
445–448

nah, copied from the right docs, but I just forgot to run the regexp on it

445–448

hmm but I did change the interface to page_token after writing the doc tho

apply comments:

  • update docstring
  • remove change to retry proxy
  • rename tests
  • dedup code using stream_results

Build is green

Patch application report for D5723 (id=20490)

Rebasing onto f140f634b6...

Current branch diff-target is up to date.
Changes applied before test
commit e3cbd5ee425cefa1e290a34cd889256036a06db0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 10 16:12:05 2021 +0200

    Add endpoint directory_get_entries, to quickly list a directory's entries
    
    It spares a join with the content table, which should hopefully make
    the vault (and possibly other users) faster when they don't need this
    join.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1324/ for more details.