Details
- Reviewers
- None
- Group Reviewers
Reviewers - Maniphest Tasks
- T2672: Inconsistent keys between visits and origin visits in the journal
Diff Detail
- Repository
- rDMOD Data model
- Branch
- idx-journal
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 16035 Build 24671: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 24670: arc lint + arc unit
Event Timeline
Build is green
Patch application report for D4194 (id=14763)
Could not rebase; Attempt merge onto bdfde82845...
Updating bdfde82..719f57a Fast-forward swh/model/model.py | 53 +++++++++++++++++++++++++++++++++++++++---- swh/model/tests/test_model.py | 30 ++++++++++++++++++++++++ 2 files changed, 79 insertions(+), 4 deletions(-)
Changes applied before test
commit 719f57afe24995d0705e9d1698c91bd7bcd3426f Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Oct 8 11:22:10 2020 +0200 model: use visit ids in the unique key, instead of their date. dates are not unique (ie. multiple visits can share a date, and they do in practice); and visit statuses already use visit ids in their unique key. commit a251df2e5b31a5d59d7e69e51a441bb22b1a7b0b Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 29 14:08:08 2020 +0200 Add a 'unique_key' method on model objects that returns a value suitable for unicity constraints. Motivation: * this is somewhat more of a model concern than a journal/kafka concern IMO * this is one step toward adding support for non-model objects in KafkaJournalWriter Implementation of the unique_key methods comes from `swh.journal.serializers.object_key`.
See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/146/ for more details.
dates are not unique (ie. multiple visits can share a date, and they
do in practice); and visit statuses already use visit ids in their
unique key.
Have you an idea of how many of these origin,date collision we have?
Looks to me that should be really a rare event, shouldn't it? (since date are stored with a microsecond resolution, right?)
microsecond in postgres, millisecond in cassandra.
that's another good reason not to use them as identifiers.
IMO, more generally, timestamps are like floating-point numbers, they should never be used as operands in an equality.
Ah ok, maybe add this statement somewhere in the commit message?
that's another good reason not to use them as identifiers.
IMO, more generally, timestamps are like floating-point numbers, they should never be used as operands in an equality.
I agree on the principle. Just wanted to know how much of an existing problem this is. I guess not so much in pg, but then pg->cassandra becomes a lossy transfer.
Thx
Build is green
Patch application report for D4194 (id=14769)
Could not rebase; Attempt merge onto bdfde82845...
Updating bdfde82..d0d6693 Fast-forward swh/model/model.py | 53 +++++++++++++++++++++++++++++++++++++++---- swh/model/tests/test_model.py | 30 ++++++++++++++++++++++++ 2 files changed, 79 insertions(+), 4 deletions(-)
Changes applied before test
commit d0d669361700c8e83fc7a94b268186be3e0129bb Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Oct 8 11:22:10 2020 +0200 model: use visit ids in the unique key, instead of their date. dates are not unique (ie. multiple visits can share a date, and they do in practice), and timestamps are also not great for testing equality (eg. migrating between systems is lossy). Additionally, visit statuses already use visit ids in their unique key. commit a251df2e5b31a5d59d7e69e51a441bb22b1a7b0b Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 29 14:08:08 2020 +0200 Add a 'unique_key' method on model objects that returns a value suitable for unicity constraints. Motivation: * this is somewhat more of a model concern than a journal/kafka concern IMO * this is one step toward adding support for non-model objects in KafkaJournalWriter Implementation of the unique_key methods comes from `swh.journal.serializers.object_key`.
See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/147/ for more details.