Changeset View
Changeset View
Standalone View
Standalone View
swh/model/identifiers.py
Show First 20 Lines • Show All 753 Lines • ▼ Show 20 Lines | def raw_extrinsic_metadata_identifier(metadata: Dict[str, Any]) -> str: | ||||
$StrWithoutSpaces and $Version are ASCII strings, and may not contain spaces. | $StrWithoutSpaces and $Version are ASCII strings, and may not contain spaces. | ||||
$Str is an UTF-8 string. | $Str is an UTF-8 string. | ||||
$CoreSwhid are core SWHIDs, as defined in :ref:`persistent-identifiers`. | $CoreSwhid are core SWHIDs, as defined in :ref:`persistent-identifiers`. | ||||
$ExtendedSwhid is a core SWHID, with extra types allowed ('ori' for | $ExtendedSwhid is a core SWHID, with extra types allowed ('ori' for | ||||
origins and 'emd' for raw extrinsic metadata) | origins and 'emd' for raw extrinsic metadata) | ||||
$Timestamp is a decimal representation of the integer number of seconds since | $Timestamp is a decimal representation of the rounded-down integer number of | ||||
the UNIX epoch (1970-01-01 00:00:00 UTC), with no leading '0' | seconds since the UNIX epoch (1970-01-01 00:00:00 UTC), | ||||
(unless the timestamp value is zero) and no timezone. | with no leading '0' (unless the timestamp value is zero) and no timezone. | ||||
It may be negative by prefixing it with a '-', which must not be followed | It may be negative by prefixing it with a '-', which must not be followed | ||||
by a '0'. | by a '0'. | ||||
Newlines in $Bytes, $Str, and $Iri are escaped as with other git fields, | Newlines in $Bytes, $Str, and $Iri are escaped as with other git fields, | ||||
ie. by adding a space after them. | ie. by adding a space after them. | ||||
Returns: | Returns: | ||||
str: the intrinsic identifier for `metadata` | str: the intrinsic identifier for `metadata` | ||||
""" | """ | ||||
timestamp = metadata["discovery_date"].timestamp() | # equivalent to using math.floor(dt.timestamp()) to round down, | ||||
# as int(dt.timestamp()) rounds toward zero, | |||||
# which would map two seconds on the 0 timestamp. | |||||
# | |||||
# This should never be an issue in practice as Software Heritage didn't | |||||
# start collecting metadata before 2015. | |||||
timestamp = ( | |||||
olasd: I'm not sure under which conditions this assert would trigger? Could you turn it into a… | |||||
Done Inline Actions
>>> dt = datetime.datetime.now().replace(tzinfo=datetime.timezone(datetime.timedelta(microseconds=1000))) >>> dt.replace(microsecond=0).timestamp() 1612795046.999 but I don't see any reason for it to happen
Why? vlorentz: > I'm not sure under which conditions this assert would trigger?
```
>>> dt = datetime. | |||||
Not Done Inline ActionsSo this means the way we're using the .replace() operation is buggy. We should first normalize the timezone to utc (.astimezone(datetime.timezone.utc)), then replace the microseconds value with zero. ValueError is the right exception to raise when a value is wrong, rather than an assert that can be optimized away. olasd: So this means the way we're using the `.replace()` operation is buggy.
We should first… | |||||
Done Inline ActionsOk. But now that I'm replacing with .astimezone, an assert is the right one to use vlorentz: Ok. But now that I'm replacing with `.astimezone`, an assert is the right one to use | |||||
metadata["discovery_date"] | |||||
.astimezone(datetime.timezone.utc) | |||||
.replace(microsecond=0) | |||||
.timestamp() | |||||
) | |||||
assert timestamp.is_integer() | |||||
headers = [ | headers = [ | ||||
(b"target", str(metadata["target"]).encode()), | (b"target", str(metadata["target"]).encode()), | ||||
(b"discovery_date", str(int(timestamp)).encode("ascii")), | (b"discovery_date", str(int(timestamp)).encode("ascii")), | ||||
( | ( | ||||
b"authority", | b"authority", | ||||
f"{metadata['authority']['type']} {metadata['authority']['url']}".encode(), | f"{metadata['authority']['type']} {metadata['authority']['url']}".encode(), | ||||
), | ), | ||||
▲ Show 20 Lines • Show All 415 Lines • Show Last 20 Lines |
I'm not sure under which conditions this assert would trigger? Could you turn it into a ValueError instead?