Page MenuHomeSoftware Heritage

Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects
Open, HighPublic


It turns out that using only swhid+authority_id+discovery_date+fetcher_id is not a good unicity key for RawExtrinsicMetadata objects, as we already need to include some other fields in it (context keys, see T2668); and we might need to include others in the future.

Additionally, it is currently intentionally unspecified in the spec and in the RPC interface what happens if we write two different objects with the same key.
This is fine, but less than ideal.

Hashing the entire object solves both these issues.

The only drawback is that the unicity key isn't human-readable anymore, and requires an API request to know what SWHID it's about. But we are already doing that for most objects, and I don't think it matters much anyway.

Event Timeline

vlorentz renamed this task from Use intrinsic identifiers for RawExtrinsicMetadata objects to Use intrinsic identifiers/hashes for RawExtrinsicMetadata objects.Oct 14 2020, 2:01 PM
vlorentz triaged this task as High priority.
vlorentz created this task.
vlorentz edited projects, added Data Model; removed Package Loader.

Proposed manifest format:

type: $ValueOfMetadataTargetType
target: $UrlOrSwhid
discovery_date: $ISO8601
authority: $NameIncludingSpaces <$Url>
fetcher: $NameIncludingSpaces $Version
format: $Str
origin: $Str  <- optional
visit: $IntInDecimal  <- optional
snapshot: $Swhid  <- optional
release: $Swhid  <- optional
revision: $Swhid  <- optional
path: $BytesWithEscapedNewLines  <-- optional
directory: $Swhid  <- optional


Note that the values of MetadataTargetType are SWH names (eg. "revision"), not git names (eg. "commit").