When called by a replayer, the visit.visit field is set; but
origin.next_visit_id was never incremented, so on the next loader
run, the visit id would be 1 even if there is already a visit
with that id.
Details
Details
- Reviewers
vsellier - Group Reviewers
Reviewers - Maniphest Tasks
- T3492: cassandra: origin_visit_add should increase next_visit_id even when upserting
- Commits
- rDSTOcf880db30bb5: cassandra: Bump next_visit_id when origin_visit_add is called by a replayer
Diff Detail
Diff Detail
- Repository
- rDSTO Storage manager
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
Comment Actions
Build is green
Patch application report for D6120 (id=22142)
Could not rebase; Attempt merge onto 9f00eb9dba...
Updating 9f00eb9d..724a67e0 Fast-forward swh/storage/cassandra/cql.py | 45 +++++++++++++++++++++++++++++++++++ swh/storage/cassandra/model.py | 4 ++-- swh/storage/cassandra/schema.py | 2 +- swh/storage/cassandra/storage.py | 21 ++++++++++++++++- swh/storage/in_memory.py | 11 +++++++++ swh/storage/tests/storage_tests.py | 48 ++++++++++++++++++++++++++++++++++++++ 6 files changed, 127 insertions(+), 4 deletions(-)
Changes applied before test
commit 724a67e06fd6e6c9ed93c28dae79db43239e7fc9 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 20 18:12:26 2021 +0200 cassandra: Bump next_visit_id when origin_visit_add is called by a replayer When called by a replayer, the visit.visit field is set; but origin.next_visit_id was never incremented, so on the next loader run, the visit id would be 1 even if there is already a visit with that id. commit a3cc0dc7b104bc8b7f05988a7e0e26fae462ac7f Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 20 13:52:17 2021 +0200 cassandra: Make content_missing query in batches Instead of calling content_find() for each object, which needs to make two queries for each. Given the latency of Cassandra queries, this should be a significant speed-up (possibly up to 100 times faster, as this is the value of PARTITION_KEY_RESTRICTION_MAX_SIZE). This also changes the schema, because CQL does not allow doing `IN` queries on compound partition keys.
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1365/ for more details.
Comment Actions
Build is green
Patch application report for D6120 (id=22163)
Could not rebase; Attempt merge onto 7113198fd6...
Updating 7113198f..cf880db3 Fast-forward swh/storage/cassandra/cql.py | 45 +++++++++++++++++++++++++++++++++++ swh/storage/cassandra/model.py | 4 ++-- swh/storage/cassandra/schema.py | 2 +- swh/storage/cassandra/storage.py | 21 ++++++++++++++++- swh/storage/in_memory.py | 11 +++++++++ swh/storage/tests/storage_tests.py | 48 ++++++++++++++++++++++++++++++++++++++ 6 files changed, 127 insertions(+), 4 deletions(-)
Changes applied before test
commit cf880db30bb549ccbdbb2cdd05b61d124ed90be7 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 20 18:12:26 2021 +0200 cassandra: Bump next_visit_id when origin_visit_add is called by a replayer When called by a replayer, the visit.visit field is set; but origin.next_visit_id was never incremented, so on the next loader run, the visit id would be 1 even if there is already a visit with that id. commit 54b5abfb26267ad56a67ad9fa2dd9d5d075e30f0 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Aug 20 13:52:17 2021 +0200 cassandra: Make content_missing query in batches Instead of calling content_find() for each object, which needs to make two queries for each. Given the latency of Cassandra queries, this should be a significant speed-up (possibly up to 100 times faster, as this is the value of PARTITION_KEY_RESTRICTION_MAX_SIZE). This also changes the schema, because CQL does not allow doing `IN` queries on compound partition keys.
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1369/ for more details.