Page MenuHomeSoftware Heritage

storage*: Open order parameter to origin-visit-get endpoint
ClosedPublic

Authored by ardumont on Jun 25 2020, 6:59 PM.

Details

Summary

This allows clients to search from most recent to oldest visit when calling
the endpoint with order parameter "desc" (visit id desc). [1]

This keeps (and explicits) the existing sorting order as "asc" (visit id asc).

[1] See D3350 for such use case

Test Plan

tox

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ardumont created this revision.Jun 25 2020, 6:59 PM
ardumont edited the summary of this revision. (Show Details)Jun 25 2020, 7:06 PM

Build has FAILED

Patch application report for D3359 (id=11910)

Rebasing onto b991e69707...

Current branch diff-target is up to date.
Changes applied before test
commit 82e6070c0712ba47bfbedcc2c3329f1828c14fd0
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 24 18:04:29 2020 +0200

    storage: Open order parameter to origin-visit-get endpoint
    
    This allows clients to search from most recent to oldest calling the endpoint
    with order parameter to "desc" (visit id desc)
    
    This keeps and explicits the existing sorting order as "asc" (visit id asc)

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/364/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/364/console

Build is green

Patch application report for D3359 (id=11910)

Rebasing onto b991e69707...

Current branch diff-target is up to date.
Changes applied before test
commit 82e6070c0712ba47bfbedcc2c3329f1828c14fd0
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 24 18:04:29 2020 +0200

    storage: Open order parameter to origin-visit-get endpoint
    
    This allows clients to search from most recent to oldest calling the endpoint
    with order parameter to "desc" (visit id desc)
    
    This keeps and explicits the existing sorting order as "asc" (visit id asc)

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/366/ for more details.

douardda added inline comments.
swh/storage/cassandra/cql.py
680

shouldn't this be a visit < ? for pagination to work?

swh/storage/db.py
596–605

same as for cassandra, how does this deal with pagination when order is desc?

ardumont added inline comments.Jun 26 2020, 10:14 AM
swh/storage/cassandra/cql.py
680

Hmm it seems you are right!
So it's missing a check or something since nothing complained.

ardumont added inline comments.Jun 26 2020, 10:41 AM
swh/storage/cassandra/cql.py
680

Yes, it's missing a check like:

# pagination support in desc order
all_visits5 = list(
    swh_storage.origin_visit_get(
        origin.url, last_visit=ov3["visit"], order="desc", limit=2)
)
assert all_visits5 == [ov2, ov1]

or something.

ardumont updated this revision to Diff 11921.Jun 26 2020, 11:46 AM

Add missing pagination corner cases (desc order) both in tests and
implementations. This is now fixed \o/

@douardda thanks for raising the question ;)

Build has FAILED

Patch application report for D3359 (id=11921)

Rebasing onto 8620519891...

Current branch diff-target is up to date.
Changes applied before test
commit 1d5717fb5e34831fc002f9b625156bfd8d7384af
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 24 18:04:29 2020 +0200

    storage*: Open order parameter to origin-visit-get endpoint
    
    This allows clients to search from most recent to oldest visit when calling the
    endpoint with the "order" parameter set to "desc" (visit id desc).
    
    This keeps and explicits the existing sorting order as visit id "asc".
    
    Related to T2310

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/368/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/368/console

Build is green

Patch application report for D3359 (id=11921)

Rebasing onto 8620519891...

Current branch diff-target is up to date.
Changes applied before test
commit 1d5717fb5e34831fc002f9b625156bfd8d7384af
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 24 18:04:29 2020 +0200

    storage*: Open order parameter to origin-visit-get endpoint
    
    This allows clients to search from most recent to oldest visit when calling the
    endpoint with the "order" parameter set to "desc" (visit id desc).
    
    This keeps and explicits the existing sorting order as visit id "asc".
    
    Related to T2310

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/370/ for more details.

ardumont marked an inline comment as done.Jun 26 2020, 12:14 PM
ardumont added a subscriber: vlorentz.
ardumont added inline comments.
swh/storage/cassandra/cql.py
680

Actually it was missing multiple scenarios around pagination in "desc" order.
And yes, once adding those, that failed as you remarked.

This is now fixed in every implementations and tested appropriately (the 8 combinations of {order, pagination, limit}).

756

It's that way because @vlorentz convinced me that using prepared statement is faster than building the query.
So there instead we define the 8 prepared statements and use them accordingly.
It's also more secure against cql injection i guess.

[1] https://www.scylladb.com/2017/12/13/prepared-statements-scylla/

swh/storage/db.py
596–605

now it does ;)

ardumont updated this revision to Diff 11926.Jun 26 2020, 12:38 PM
ardumont marked an inline comment as done.

Rebase on latest master

Build is green

Patch application report for D3359 (id=11926)

Rebasing onto f75cd41ab8...

Current branch diff-target is up to date.
Changes applied before test
commit 3b1695b2e7b1f5e227410b41953921e5a5806554
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 24 18:04:29 2020 +0200

    storage*: Open order parameter to origin-visit-get endpoint
    
    This allows clients to search from most recent to oldest visit when calling the
    endpoint with the "order" parameter set to "desc" (visit id desc).
    
    This keeps and explicits the existing sorting order as visit id "asc".
    
    Related to T2310

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/373/ for more details.

vlorentz added inline comments.Jun 26 2020, 1:07 PM
swh/storage/cassandra/cql.py
698–723

black folded the newlines but not the quotes in the prepared statement queries

swh/storage/db.py
594

don't use assertions for sanitization.

Running Python in optimized mode (-O) doesn't run assertions, so callers could inject SQL via that parameter.

vlorentz added inline comments.Jun 26 2020, 1:11 PM
swh/storage/db.py
594

well it's not a big deal because we're unlikely to ever do that, but I really don't like substituting a parameter in the query.

I'd feel better with something like:

if order == 'asc':
    query = "ORDER BY ov.visit ASC"
elif order == 'desc':
    query = "ORDER BY ov.visit DESC"
else:
    assert False

even if it's redundant

ardumont added inline comments.Jun 26 2020, 1:12 PM
swh/storage/db.py
594

Damn... thanks.

What should i replace it with then?

if order.lower() not in ALLOWED_ORDER:
    raise ValueError(f"Order parameter must be one of {','.join(ALLOWED_ORDER)}")

or something?

ardumont updated this revision to Diff 11930.Jun 26 2020, 1:22 PM

Adapt according to @vlorentz's remarks (thx)

ardumont marked 2 inline comments as done.Jun 26 2020, 1:22 PM
ardumont added inline comments.
swh/storage/cassandra/cql.py
698–723

ah yeah, thanks.
I saw and forgot to fix those (also the flakiness in tests annoys me a bit...).

Build is green

Patch application report for D3359 (id=11930)

Rebasing onto f75cd41ab8...

Current branch diff-target is up to date.
Changes applied before test
commit 182ee497328a55d99609e5230146268a108508e3
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Jun 24 18:04:29 2020 +0200

    storage*: Open order parameter to origin-visit-get endpoint
    
    This allows clients to search from most recent to oldest visit when calling the
    endpoint with the "order" parameter set to "desc" (visit id desc).
    
    This keeps and explicits the existing sorting order as visit id "asc".
    
    Related to T2310

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/376/ for more details.

vlorentz added inline comments.Jun 26 2020, 3:01 PM
swh/storage/db.py
594

no need for a constant, but yeah that's fine

ardumont added inline comments.Jun 26 2020, 3:27 PM
swh/storage/db.py
594

I did as your first proposal in the end ;)

vlorentz accepted this revision.Jun 26 2020, 3:30 PM
This revision is now accepted and ready to land.Jun 26 2020, 3:30 PM