Page MenuHomeSoftware Heritage

Allow -1 as Content length.
ClosedPublic

Authored by vlorentz on Aug 19 2019, 2:34 PM.

Details

Summary

It denotes files whose length is unknown.

Diff Detail

Repository
rDMOD Data model
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

olasd added a subscriber: olasd.

[after IRL discussion of the change]

This change (and the changes around it) makes me realise that it's not entirely clear what this Content model is supposed to specify:

  • valid inputs for Storage.content_add
  • valid outputs of our backend storage APIs

The answer seems to be "a superset of both".

It also makes the weirdness of the dichotomy between content and skipped_content more glaring. In hindsight, having two somewhat incompatible tables populated by the same call to content_add is a bit weird.

It's also really the only place where we handle holes in the graph, while we really should handle them for all object types.

I'll accept the diff because length=-1 is a valid input to content_add; You can add a cross-check to make sure that length is -1 only when the content is marked as absent, until we untangle the management of the content and skipped_content tables.

This revision is now accepted and ready to land.Aug 19 2019, 4:50 PM

Only allow length=-1 if status=absent.

This revision was landed with ongoing or failed builds.Aug 19 2019, 4:56 PM
This revision was automatically updated to reflect the committed changes.

[ I'm jumping in here, but obviously I'm missing the IRL context. ]

In D1862#43313, @olasd wrote:

It's also really the only place where we handle holes in the graph, while we really should handle them for all object types.

This is an important discrepancy that we should fix indeed.
To expand, and make sure we are on the same page, this is the only place where we can store some (incomplete) information about objects that are missing in full.
For other missing objects we just don't store any information at all about them, so all links to them from elsewhere are just dangling.

  • Having a list of all missing objects is something that would be already useful per se, and that today we don't have.
  • Having a place where to store partial information for any kind of missing objects will be a plus.

Does this summary sound correct?

(If so, we should stash this info in a dedicated task and discuss the matter further there.)

so all links to them from elsewhere are just dangling.

it's worse than that. Holes in VCSs other than git can't even be referenced, because we don't know their sha1_git.

Does this summary sound correct?

Yes

(If so, we should stash this info in a dedicated task and discuss the matter further there.)

I always assumed there was one, but I can't find it. So here it is: T1957