Page MenuHomeSoftware Heritage

Unable to perfectly round-trip releases pointed at by occurrences
Closed, MigratedEdits Locked

Description

The current model for occurrences and releases makes it impossible to list the releases that appear for a given origin at a given time.

When we store an occurrence that points to a release, we "peel" the release until we find a revision, and store that.

This means that when we want to list the releases that were found on a given origin, we need to "backtrack" all the releases that point to some revisions that have been found in the origin. We therefore get a superset of the releases found at the origin (nothing prevents another origin from having other annotated releases pointing to the revisions we have in our origin).

One possible solution would be to let occurrences point to releases instead of being peeled down to revisions.

Event Timeline

olasd raised the priority of this task from to Normal.
olasd updated the task description. (Show Details)
olasd added a subscriber: olasd.

What about having "occurrences" for all kinds of objects in a VCS, releases, tags, revisions, etc. ?
We would definitely need to look at other VCS to get a general model; for example, in Darcs patches are first class citizens: do we have a way of accomodating this in our data model?

Let's remember that it's OK to focus on git as a priority, but it's NOT OK to believe its data model covers everything.

What about having "occurrences" for all kinds of objects in a VCS, releases, tags, revisions, etc. ?

That is in fact already the case. All "top-level" objects visible in a repository have entries in the occurrence table. For Git, this means "refs", that are used for both tags and branches.
Individual commits do not have entries in the occurrence table, but that is just an optimization which does not lose information. Starting from refs we can reach any other (retrievable) commit in the Git repository, and storing an occurrence entry for the ref we can know exactly which commits were there.

Let's remember that it's OK to focus on git as a priority, but it's NOT OK to believe its data model covers everything.

Absolutely. Our focus on Git right now has been based on the hypothesis that it is the most general model we've stumbled upon up to now, with reasonable confidence that it will also fit other models. But of course this hypothesis hasn't been tested much yet, so it is still under review :-) Regarding darcs, it is indeed possible that we will have to store more entries in the occurrence table to cope with it. Or maybe find some different solution altogether.

olasd claimed this task.

This has now been deployed in swh.storage v0.0.30: occurrences and releases can now point to arbitrary objects.

Cool!

Roberto Di Cosmo (via mobile/cell)
Le 26 janv. 2016 07:47, "olasd (Nicolas Dandrimont)" <
forge@softwareheritage.org> a écrit :

olasd closed this task as "Resolved".
olasd claimed this task.
olasd added a comment.

This has now been deployed in swh.storage v0.0.30: occurrences and

releases can now point to arbitrary objects.

TASK DETAIL

https://forge.softwareheritage.org/T78

EMAIL PREFERENCES

https://forge.softwareheritage.org/settings/panel/emailpreferences/

To: olasd
Cc: ardumont, zack, rdicosmo, olasd

olasd changed the visibility from "All Users" to "Public (No Login Required)".May 13 2016, 5:06 PM