When called by a replayer, the visit.visit field is set; but
origin.next_visit_id was never incremented, so on the next loader
run, the visit id would be 1 even if there is already a visit
with that id.
Details
Details
- Reviewers
vsellier - Group Reviewers
Reviewers - Maniphest Tasks
- T3492: cassandra: origin_visit_add should increase next_visit_id even when upserting
- Commits
- rDSTOcf880db30bb5: cassandra: Bump next_visit_id when origin_visit_add is called by a replayer
Diff Detail
Diff Detail
- Repository
- rDSTO Storage manager
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 23120 Build 36061: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 36060: arc lint + arc unit
Event Timeline
Comment Actions
Build is green
Patch application report for D6120 (id=22142)
Could not rebase; Attempt merge onto 9f00eb9dba...
Updating 9f00eb9d..724a67e0 Fast-forward swh/storage/cassandra/cql.py | 45 +++++++++++++++++++++++++++++++++++ swh/storage/cassandra/model.py | 4 ++-- swh/storage/cassandra/schema.py | 2 +- swh/storage/cassandra/storage.py | 21 ++++++++++++++++- swh/storage/in_memory.py | 11 +++++++++ swh/storage/tests/storage_tests.py | 48 ++++++++++++++++++++++++++++++++++++++ 6 files changed, 127 insertions(+), 4 deletions(-)
Changes applied before test
commit 724a67e06fd6e6c9ed93c28dae79db43239e7fc9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Fri Aug 20 18:12:26 2021 +0200
cassandra: Bump next_visit_id when origin_visit_add is called by a replayer
When called by a replayer, the visit.visit field is set; but
origin.next_visit_id was never incremented, so on the next loader
run, the visit id would be 1 even if there is already a visit
with that id.
commit a3cc0dc7b104bc8b7f05988a7e0e26fae462ac7f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Fri Aug 20 13:52:17 2021 +0200
cassandra: Make content_missing query in batches
Instead of calling content_find() for each object, which needs to make
two queries for each.
Given the latency of Cassandra queries, this should be a significant
speed-up (possibly up to 100 times faster, as this is the value of
PARTITION_KEY_RESTRICTION_MAX_SIZE).
This also changes the schema, because CQL does not allow doing `IN`
queries on compound partition keys.See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1365/ for more details.
Comment Actions
Build is green
Patch application report for D6120 (id=22163)
Could not rebase; Attempt merge onto 7113198fd6...
Updating 7113198f..cf880db3 Fast-forward swh/storage/cassandra/cql.py | 45 +++++++++++++++++++++++++++++++++++ swh/storage/cassandra/model.py | 4 ++-- swh/storage/cassandra/schema.py | 2 +- swh/storage/cassandra/storage.py | 21 ++++++++++++++++- swh/storage/in_memory.py | 11 +++++++++ swh/storage/tests/storage_tests.py | 48 ++++++++++++++++++++++++++++++++++++++ 6 files changed, 127 insertions(+), 4 deletions(-)
Changes applied before test
commit cf880db30bb549ccbdbb2cdd05b61d124ed90be7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Fri Aug 20 18:12:26 2021 +0200
cassandra: Bump next_visit_id when origin_visit_add is called by a replayer
When called by a replayer, the visit.visit field is set; but
origin.next_visit_id was never incremented, so on the next loader
run, the visit id would be 1 even if there is already a visit
with that id.
commit 54b5abfb26267ad56a67ad9fa2dd9d5d075e30f0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Fri Aug 20 13:52:17 2021 +0200
cassandra: Make content_missing query in batches
Instead of calling content_find() for each object, which needs to make
two queries for each.
Given the latency of Cassandra queries, this should be a significant
speed-up (possibly up to 100 times faster, as this is the value of
PARTITION_KEY_RESTRICTION_MAX_SIZE).
This also changes the schema, because CQL does not allow doing `IN`
queries on compound partition keys.See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1369/ for more details.