Page MenuHomeSoftware Heritage

storage*: Open order parameter to origin-visit-get endpoint
ClosedPublic

Authored by ardumont on Jun 25 2020, 6:59 PM.

Details

Summary

This allows clients to search from most recent to oldest visit when calling
the endpoint with order parameter "desc" (visit id desc). [1]

This keeps (and explicits) the existing sorting order as "asc" (visit id asc).

[1] See D3350 for such use case

Test Plan

tox

Diff Detail

Repository
rDSTO Storage manager
Branch
open-order-parameter
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 13141
Build 20054: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 20053: arc lint + arc unit

Event Timeline

Build has FAILED

Patch application report for D3359 (id=11910)

Rebasing onto b991e69707...

Current branch diff-target is up to date.
Changes applied before test
commit 82e6070c0712ba47bfbedcc2c3329f1828c14fd0
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 24 18:04:29 2020 +0200

    storage: Open order parameter to origin-visit-get endpoint
    
    This allows clients to search from most recent to oldest calling the endpoint
    with order parameter to "desc" (visit id desc)
    
    This keeps and explicits the existing sorting order as "asc" (visit id asc)

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/364/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/364/console

Build is green

Patch application report for D3359 (id=11910)

Rebasing onto b991e69707...

Current branch diff-target is up to date.
Changes applied before test
commit 82e6070c0712ba47bfbedcc2c3329f1828c14fd0
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 24 18:04:29 2020 +0200

    storage: Open order parameter to origin-visit-get endpoint
    
    This allows clients to search from most recent to oldest calling the endpoint
    with order parameter to "desc" (visit id desc)
    
    This keeps and explicits the existing sorting order as "asc" (visit id asc)

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/366/ for more details.

douardda added inline comments.
swh/storage/cassandra/cql.py
679

shouldn't this be a visit < ? for pagination to work?

swh/storage/db.py
596–605

same as for cassandra, how does this deal with pagination when order is desc?

swh/storage/cassandra/cql.py
679

Hmm it seems you are right!
So it's missing a check or something since nothing complained.

swh/storage/cassandra/cql.py
679

Yes, it's missing a check like:

# pagination support in desc order
all_visits5 = list(
    swh_storage.origin_visit_get(
        origin.url, last_visit=ov3["visit"], order="desc", limit=2)
)
assert all_visits5 == [ov2, ov1]

or something.

Add missing pagination corner cases (desc order) both in tests and
implementations. This is now fixed \o/

@douardda thanks for raising the question ;)

Build has FAILED

Patch application report for D3359 (id=11921)

Rebasing onto 8620519891...

Current branch diff-target is up to date.
Changes applied before test
commit 1d5717fb5e34831fc002f9b625156bfd8d7384af
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 24 18:04:29 2020 +0200

    storage*: Open order parameter to origin-visit-get endpoint
    
    This allows clients to search from most recent to oldest visit when calling the
    endpoint with the "order" parameter set to "desc" (visit id desc).
    
    This keeps and explicits the existing sorting order as visit id "asc".
    
    Related to T2310

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/368/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/368/console

Build is green

Patch application report for D3359 (id=11921)

Rebasing onto 8620519891...

Current branch diff-target is up to date.
Changes applied before test
commit 1d5717fb5e34831fc002f9b625156bfd8d7384af
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 24 18:04:29 2020 +0200

    storage*: Open order parameter to origin-visit-get endpoint
    
    This allows clients to search from most recent to oldest visit when calling the
    endpoint with the "order" parameter set to "desc" (visit id desc).
    
    This keeps and explicits the existing sorting order as visit id "asc".
    
    Related to T2310

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/370/ for more details.

ardumont added a subscriber: vlorentz.
ardumont added inline comments.
swh/storage/cassandra/cql.py
679

Actually it was missing multiple scenarios around pagination in "desc" order.
And yes, once adding those, that failed as you remarked.

This is now fixed in every implementations and tested appropriately (the 8 combinations of {order, pagination, limit}).

755

It's that way because @vlorentz convinced me that using prepared statement is faster than building the query.
So there instead we define the 8 prepared statements and use them accordingly.
It's also more secure against cql injection i guess.

[1] https://www.scylladb.com/2017/12/13/prepared-statements-scylla/

swh/storage/db.py
596–605

now it does ;)

ardumont marked an inline comment as done.

Rebase on latest master

Build is green

Patch application report for D3359 (id=11926)

Rebasing onto f75cd41ab8...

Current branch diff-target is up to date.
Changes applied before test
commit 3b1695b2e7b1f5e227410b41953921e5a5806554
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 24 18:04:29 2020 +0200

    storage*: Open order parameter to origin-visit-get endpoint
    
    This allows clients to search from most recent to oldest visit when calling the
    endpoint with the "order" parameter set to "desc" (visit id desc).
    
    This keeps and explicits the existing sorting order as visit id "asc".
    
    Related to T2310

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/373/ for more details.

swh/storage/cassandra/cql.py
698–723

black folded the newlines but not the quotes in the prepared statement queries

swh/storage/db.py
594

don't use assertions for sanitization.

Running Python in optimized mode (-O) doesn't run assertions, so callers could inject SQL via that parameter.

swh/storage/db.py
594

well it's not a big deal because we're unlikely to ever do that, but I really don't like substituting a parameter in the query.

I'd feel better with something like:

if order == 'asc':
    query = "ORDER BY ov.visit ASC"
elif order == 'desc':
    query = "ORDER BY ov.visit DESC"
else:
    assert False

even if it's redundant

swh/storage/db.py
594

Damn... thanks.

What should i replace it with then?

if order.lower() not in ALLOWED_ORDER:
    raise ValueError(f"Order parameter must be one of {','.join(ALLOWED_ORDER)}")

or something?

Adapt according to @vlorentz's remarks (thx)

ardumont added inline comments.
swh/storage/cassandra/cql.py
698–723

ah yeah, thanks.
I saw and forgot to fix those (also the flakiness in tests annoys me a bit...).

Build is green

Patch application report for D3359 (id=11930)

Rebasing onto f75cd41ab8...

Current branch diff-target is up to date.
Changes applied before test
commit 182ee497328a55d99609e5230146268a108508e3
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 24 18:04:29 2020 +0200

    storage*: Open order parameter to origin-visit-get endpoint
    
    This allows clients to search from most recent to oldest visit when calling the
    endpoint with the "order" parameter set to "desc" (visit id desc).
    
    This keeps and explicits the existing sorting order as visit id "asc".
    
    Related to T2310

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/376/ for more details.

swh/storage/db.py
594

no need for a constant, but yeah that's fine

swh/storage/db.py
594

I did as your first proposal in the end ;)

This revision is now accepted and ready to land.Jun 26 2020, 3:30 PM