Page MenuHomeSoftware Heritage

graph dataset: update to use persistent identifiers everywhere
Closed, MigratedEdits Locked

Description

The graph dataset uses SHA1s as identifiers and file names to identify the type of node.
That is inconsistent and leads to ambiguities, e.g., in the edge lists that can point to multiple types of nodes (e.g., snapshot_to_obj and release_to_obj).

We should redo the exports (or hot patch the existing ones) to use SWH PIDs as identifiers.