Page MenuHomeSoftware Heritage

algos.revisions_walker: Handle missing revisions in the archive
ClosedPublic

Authored by anlambert on Apr 18 2019, 4:20 PM.

Details

Summary

It exist rare cases when a revsion is referenced but is missing into the archive content
(some deposit origins have that issue for instance).

So ensure the revisions walker implementation to skip the processing of these missing revisions.

Related T1675

Diff Detail

Repository
rDSTO Storage manager
Branch
revisions-walker-fix
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 5506
Build 7477: tox-on-jenkinsJenkins
Build 7476: arc lint + arc unit

Event Timeline

olasd requested changes to this revision.Apr 18 2019, 4:31 PM
olasd added a subscriber: olasd.

Please clarify the code and the commit message to say that this is handling truncated/shallow histories where a revision's parent is referenced but doesn't exist in the archive.

Could we find a way to return a flag saying "history truncated", "parent revision not found" or something more explicit than just returning "invalid" results silently?

swh/storage/algos/revisions_walker.py
148

maybe add a comment here saying that the history got truncated?

178

Same here ?

This revision now requires changes to proceed.Apr 18 2019, 4:31 PM

Just to be clear, the "history truncated" flag stuff can happen later, the new behavior after this minimal change is still better than just crashing

Could we find a way to return a flag saying "history truncated", "parent revision not found" or something more explicit than just returning "invalid" results silently?

This indeed would be of interest to have to catch possible loading issues. This needs some specification though, let's create a task on the subject.

swh/storage/algos/revisions_walker.py
148

Sure, will improve that

This revision is now accepted and ready to land.Apr 18 2019, 5:39 PM
This revision was automatically updated to reflect the committed changes.