Page MenuHomeSoftware Heritage

Store original git manifests
Closed, ResolvedPublic

Description

For "weird" objects, alongside the rest of the row.

To do:

  1. add a new raw_manifest field to Directory and Revision objects (and Release, for future-proofing), with type Optional[bytes]. it should contain the type and size (ie. the complete git header)
  2. add a check method to model objects; should check the id matches (self.compute_hash() == self.id), but also that if raw_manifest is not None, then then it must differ from the manifest we would compute (ie. there shouldn't be a useless value in raw_manifest)
  3. Add a column in postgres, defaults to NULL and write it
  4. Monitor the number of objects with a non-NULL raw_manifest, and warn if it raises too fast (it probably means there is a bug in a loader) -> swh-counters
  5. Figure a way to report issues from directly from the git loader? (eg. make the git-loader raise an issue in sentry if too many objects in the same repo have a raw_manifest)

Make the vault use it when available, somewhere after 3

Related Objects

Event Timeline

vlorentz triaged this task as Normal priority.
vlorentz created this task.