Page MenuHomeSoftware Heritage

Add snapshot models
ClosedPublic

Authored by olasd on Nov 10 2017, 6:06 PM.

Details

Summary

Snapshots are the new, improved occurrences; They're the topmost object in the
Software Heritage Merkle Tree, and represent a full picture of an origin at a
given time.

Snapshots contain a list of named pointers to objects in the Software Heritage
archive, as well as an intrinsic identifier. The full specification is
supported: pointers to all types of objects, dangling pointers, as well as alias
branches.

They're implemented with a somewhat classic fully normalised model; Foreign keys
use a sha1_git, which makes more sense regarding pointing at non-existent
objects, at the expense of some economies of size.

Backwards compatibility both ways with occurrences is ensured: when adding a
snapshot linked to an origin visit, the corresponding occurrences are created in
occurrence_history; when querying the snapshot for an origin visit where we
haven't generated the snapshot yet, a virtual snapshot with id None is returned.
This lets us migrate to the new tables gently.

Close T567.

Test Plan

Integration tests are included.

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Shouldn't the snapshot_add/get* methods work with multiple snapshots? Like release_get takes an array of releases as its input.

In D268#5499, @seirl wrote:

Shouldn't the snapshot_add/get* methods work with multiple snapshots? Like release_get takes an array of releases as its input.

We never add several snapshots at once, so I don't think this is worth the added complexity.

Sounds fine so far \m/

sql/swh-func.sql
1012

That may be voluntary but I read in the wiki draft (referenced by the task T565) that snapshot could also target snapshot, so i think we are missing this case (possibly same case in the swh_snapshot_add function?).

sql/swh-indexes.sql
166

Nice :)

swh/storage/storage.py
832

What is an empty snapshot?

863

(curious me) What is that syntax?

Rebase on top of latest changes

olasd marked 3 inline comments as done.

Use sha1_git as an object identifier instead of the opaque object_id

olasd retitled this revision from [WIP] Add snapshot models to Add snapshot models.Dec 15 2017, 11:11 AM
olasd edited the summary of this revision. (Show Details)
olasd edited the summary of this revision. (Show Details)
swh/storage/storage.py
765

the markup here might be broken, due to the need of empty lines between nested lists https://wiki.softwareheritage.org/index.php?title=Sphinx_gotchas#Lists (i haven't checked it though)

This revision was automatically updated to reflect the committed changes.