This is a long-standing and well-known issue, but I don't think a task was open about it yet.
When ingesting an origin, some nodes of the DAG may be missing, for various reasons:
- corrupted data (eg. a commit in the git history does not match its hash)
- directory must be found "somewhere else" (eg. SVN external (T611)
- revisions must be found "somewhere else" (eg. Bazaar stacked branches)
- ingestion of a (potentially large) repo might stop/crash after having ingested only some of its objects, and the repository might have disappeared when we try again
Currently, what happens is:
- if the missing object is a git object, then we know its sha1_git, and it's just a dangling reference (though this will be an issue when we will want to implement generation numbers, T1617)
- even in this (fortunate) case, other objects transitively referenced might remain completely unknown
- otherwise, objects referencing the missing object cannot even be represented in the SWH data model (and recursively, all objects referencing it)