Snapshots are the new, improved occurrences; They're the topmost object in the
Software Heritage Merkle Tree, and represent a full picture of an origin at a
given time.
Snapshots contain a list of named pointers to objects in the Software Heritage
archive, as well as an intrinsic identifier (un. The full specified yetication is
supported: pointers to all types of objects, punt of T566).dangling pointers, Foras well as alias
now, null pointers are supported: the use case is we know a branch exists but wees.
couldn't import it, for instance for partial loads of the Debian loader.
They're implemented with a somewhat classic fully normalised model; As a nod toForeign keys
T835 the foreign keys use the opaque object_id keys rather than explicituse a sha1_git, which makes more sense regarding pointing at non-existent
identifiers.objects, This is currently a problem as we haveat the expense of some occurrences that pointeconomies of size.
to non-existing objects, from the original GitHub import.
Backwards compatibility both ways with occurrences is ensured: when adding a
snapshot linked to an origin visit, the corresponding occurrences are created in
occurrence_history; when querying the snapshot for an origin visit where we
haven't generated the snapshot yet, a virtual snapshot with id None is returned.
This lets us migrate to the new tables gently.
TODO:
- assess the dangling pointers in occurrence_history and whether they're fixable
- finalise the schema regarding null pointers
- add more tests wrt null pointers, multiple branches, ...
Close T567.