Page MenuHomeSoftware Heritage

Add test for origin_visit_get_latest in presence of mismatched id and date orders
ClosedPublic

Authored by vlorentz on Aug 20 2021, 8:12 PM.

Details

Summary

It was unclear this actually worked; I had to write this test to realize
the code wasn't buggy.

Also replaced a conditional that is always False (because Cassandra
always returns results in the order of the clustering key) with an
assertion, so the code is less confusing.

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D6121 (id=22143)

Could not rebase; Attempt merge onto 9f00eb9dba...

Updating 9f00eb9d..e291c74b
Fast-forward
 swh/storage/cassandra/cql.py       | 45 ++++++++++++++++++++++++
 swh/storage/cassandra/model.py     |  4 +--
 swh/storage/cassandra/schema.py    |  2 +-
 swh/storage/cassandra/storage.py   | 26 ++++++++++++--
 swh/storage/in_memory.py           | 11 ++++++
 swh/storage/tests/storage_tests.py | 70 ++++++++++++++++++++++++++++++++++++++
 6 files changed, 152 insertions(+), 6 deletions(-)
Changes applied before test
commit e291c74b04b8e7501f4e41ea237591038ff2d9b8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 20 20:11:51 2021 +0200

    Add test for origin_visit_get_latest in presence of mismatched id and date orders
    
    It was unclear this actually worked; I had to write this test to realize
    the code wasn't buggy.
    
    Also replaced a conditional that is always False (because Cassandra
    always returns results in the order of the clustering key) with an
    assertion, so the code is less confusing.

commit 724a67e06fd6e6c9ed93c28dae79db43239e7fc9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 20 18:12:26 2021 +0200

    cassandra: Bump next_visit_id when origin_visit_add is called by a replayer
    
    When called by a replayer, the visit.visit field is set; but
    origin.next_visit_id was never incremented, so on the next loader
    run, the visit id would be 1 even if there is already a visit
    with that id.

commit a3cc0dc7b104bc8b7f05988a7e0e26fae462ac7f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 20 13:52:17 2021 +0200

    cassandra: Make content_missing query in batches
    
    Instead of calling content_find() for each object, which needs to make
    two queries for each.
    
    Given the latency of Cassandra queries, this should be a significant
    speed-up (possibly up to 100 times faster, as this is the value of
    PARTITION_KEY_RESTRICTION_MAX_SIZE).
    
    This also changes the schema, because CQL does not allow doing `IN`
    queries on compound partition keys.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1366/ for more details.

This revision is now accepted and ready to land.Aug 23 2021, 3:26 PM

Build is green

Patch application report for D6121 (id=22164)

Could not rebase; Attempt merge onto 7113198fd6...

Updating 7113198f..8f1cdf65
Fast-forward
 swh/storage/cassandra/cql.py       | 45 ++++++++++++++++++++++++
 swh/storage/cassandra/model.py     |  4 +--
 swh/storage/cassandra/schema.py    |  2 +-
 swh/storage/cassandra/storage.py   | 26 ++++++++++++--
 swh/storage/in_memory.py           | 11 ++++++
 swh/storage/tests/storage_tests.py | 70 ++++++++++++++++++++++++++++++++++++++
 6 files changed, 152 insertions(+), 6 deletions(-)
Changes applied before test
commit 8f1cdf65a1056dac42755e8c70ae38f3d34aa459
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 20 20:11:51 2021 +0200

    Add test for origin_visit_get_latest in presence of mismatched id and date orders
    
    It was unclear this actually worked; I had to write this test to realize
    the code wasn't buggy.
    
    Also replaced a conditional that is always False (because Cassandra
    always returns results in the order of the clustering key) with an
    assertion, so the code is less confusing.

commit cf880db30bb549ccbdbb2cdd05b61d124ed90be7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 20 18:12:26 2021 +0200

    cassandra: Bump next_visit_id when origin_visit_add is called by a replayer
    
    When called by a replayer, the visit.visit field is set; but
    origin.next_visit_id was never incremented, so on the next loader
    run, the visit id would be 1 even if there is already a visit
    with that id.

commit 54b5abfb26267ad56a67ad9fa2dd9d5d075e30f0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 20 13:52:17 2021 +0200

    cassandra: Make content_missing query in batches
    
    Instead of calling content_find() for each object, which needs to make
    two queries for each.
    
    Given the latency of Cassandra queries, this should be a significant
    speed-up (possibly up to 100 times faster, as this is the value of
    PARTITION_KEY_RESTRICTION_MAX_SIZE).
    
    This also changes the schema, because CQL does not allow doing `IN`
    queries on compound partition keys.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1370/ for more details.