Page MenuHomeSoftware Heritage

Allow negative_utc to be None in normalize_timestamp()
ClosedPublic

Authored by douardda on Jun 11 2020, 5:35 PM.

Details

Summary

thus in TimestampWithTimezone.from_dict(). This is needed to help consuming
existing (invalid) messages from kafka.

Not sure about the couple of tests added in there... this is far from covering properly this function and may give the false impression it's properly tested while it's not.

Diff Detail

Repository
rDMOD Data model
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D3263 (id=11573)

Rebasing onto 3d9f69444a...

Current branch diff-target is up to date.
Changes applied before test
commit 0e0bd5b289f363f2c69715faf803f5669867835b
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Jun 11 17:00:50 2020 +0200

    Allow negative_utc to be None in normalize_timestamp()
    
    thus in TimestampWithTimezone.from_dict(). This is needed to help consuming
    existing (invalid) messages from kafka.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/70/ for more details.

This is needed to help consuming existing (invalid) messages from kafka.

If this is the only purpose, it should be in swh.storage.fixer

As I've pointed out on IRC, I think (offset, negative_utc) == (0, None) should raise an error, as this is undefined behavior.

In D3263#79805, @olasd wrote:

As I've pointed out on IRC, I think (offset, negative_utc) == (0, None) should raise an error, as this is undefined behavior.

The problem is this the scope of this function is unclear to me. It's a normalization function, not a validation one. Actually, I'm not even sure if this function should do any kind of validation at all. This is the job of the model entity behind.

Not sure.

This is needed to help consuming existing (invalid) messages from kafka.

If this is the only purpose, it should be in swh.storage.fixer

well, doing in there (which was my first move) is more complicated (bunch of if statement for each possible object type) and adds complexity to this fixer module which is already pretty messy (IMHO) and have no dedicated test, rather than fixing it only once as a 2 lines fix... I find this makes more sense like this.

swh/model/identifiers.py
297–298

if negative_utc is None and offset == 0:?

swh/model/identifiers.py
297–298

The whole point of this diff is handling the longstanding behavior of not caring about negative_utc when the offset is not 0, which kafka is currently full of. So clearly, not.

This revision is now accepted and ready to land.Jun 12 2020, 12:45 PM

add a warning in the commit message

This revision was landed with ongoing or failed builds.Jun 15 2020, 9:45 AM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D3263 (id=11624)

Rebasing onto a427e184fb...

First, rewinding head to replay your work on top of it...
Fast-forwarded diff-target to base-revision-71-D3263.
Changes applied before test

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/71/ for more details.