Changeset View
Changeset View
Standalone View
Standalone View
swh/model/identifiers.py
Show First 20 Lines • Show All 682 Lines • ▼ Show 20 Lines | def raw_extrinsic_metadata_identifier(metadata: Dict[str, Any]) -> str: | ||||
A raw_extrinsic_metadata identifier is a salted sha1 (using the git | A raw_extrinsic_metadata identifier is a salted sha1 (using the git | ||||
hashing algorithm with the ``raw_extrinsic_metadata`` object type) of | hashing algorithm with the ``raw_extrinsic_metadata`` object type) of | ||||
a manifest following the format: | a manifest following the format: | ||||
``` | ``` | ||||
target_type: $ValueOfMetadataTargetType | target_type: $ValueOfMetadataTargetType | ||||
target: $UrlOrSwhid | target: $UrlOrSwhid | ||||
discovery_date: $ISO8601 | discovery_date: $Timestamp | ||||
authority: $StrWithoutSpaces $IRI | authority: $StrWithoutSpaces $IRI | ||||
fetcher: $Str $Version | fetcher: $Str $Version | ||||
format: $StrWithoutSpaces | format: $StrWithoutSpaces | ||||
origin: $IRI <- optional | origin: $IRI <- optional | ||||
visit: $IntInDecimal <- optional | visit: $IntInDecimal <- optional | ||||
snapshot: $Swhid <- optional | snapshot: $Swhid <- optional | ||||
release: $Swhid <- optional | release: $Swhid <- optional | ||||
revision: $Swhid <- optional | revision: $Swhid <- optional | ||||
path: $Bytes <- optional | path: $Bytes <- optional | ||||
directory: $Swhid <- optional | directory: $Swhid <- optional | ||||
$MetadataBytes | $MetadataBytes | ||||
``` | ``` | ||||
$IRI must be RFC 3987 IRIs (so they may contain newlines, that are escaped as | $IRI must be RFC 3987 IRIs (so they may contain newlines, that are escaped as | ||||
described below) | described below) | ||||
$StrWithoutSpaces and $Version are ASCII strings, and may not contain spaces. | $StrWithoutSpaces and $Version are ASCII strings, and may not contain spaces. | ||||
$Str is an UTF-8 string. | $Str is an UTF-8 string. | ||||
$Swhid are core SWHIDs, as defined in :ref:`persistent-identifiers`. | $Swhid are core SWHIDs, as defined in :ref:`persistent-identifiers`. | ||||
$Timestamp is a decimal representation of the integer number of seconds since | |||||
the UNIX epoch (1970-01-01 00:00:00 UTC), with no leading '0' | |||||
(unless the timestamp value is zero) and no timezone. | |||||
It may be negative by prefixing it with a '-', which must not be followed | |||||
by a '0'. | |||||
Newlines in $Bytes, $Str, and $Iri are escaped as with other git fields, | Newlines in $Bytes, $Str, and $Iri are escaped as with other git fields, | ||||
ie. by adding a space after them. | ie. by adding a space after them. | ||||
Returns: | Returns: | ||||
str: the intrinsic identifier for `metadata` | str: the intrinsic identifier for `metadata` | ||||
""" | """ | ||||
discovery_date = metadata["discovery_date"] | |||||
if discovery_date.microsecond != 0: | |||||
raise ValueError(f"discovery_date={discovery_date} has microsecond != 0") | |||||
timestamp = discovery_date.timestamp() | |||||
assert timestamp.is_integer() | |||||
headers = [ | headers = [ | ||||
(b"target_type", metadata["type"].encode("ascii")), | (b"target_type", metadata["type"].encode("ascii")), | ||||
(b"target", str(metadata["target"]).encode()), | (b"target", str(metadata["target"]).encode()), | ||||
(b"discovery_date", metadata["discovery_date"].isoformat().encode("ascii")), | (b"discovery_date", str(int(timestamp)).encode("ascii")), | ||||
( | ( | ||||
b"authority", | b"authority", | ||||
f"{metadata['authority']['type']} {metadata['authority']['url']}".encode(), | f"{metadata['authority']['type']} {metadata['authority']['url']}".encode(), | ||||
), | ), | ||||
( | ( | ||||
b"fetcher", | b"fetcher", | ||||
f"{metadata['fetcher']['name']} {metadata['fetcher']['version']}".encode(), | f"{metadata['fetcher']['name']} {metadata['fetcher']['version']}".encode(), | ||||
), | ), | ||||
▲ Show 20 Lines • Show All 224 Lines • Show Last 20 Lines |