Page MenuHomeSoftware Heritage

Make the swh.loader.package exception handling more granular
ClosedPublic

Authored by ardumont on Fri, Mar 20, 6:11 PM.

Details

Summary

This change generalizes the behavior of loading most revisions and generating a
partial snapshot when we failed to load some of them.

Depends on D2860

Test Plan

tox

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

olasd created this revision.Fri, Mar 20, 6:11 PM

I'm fine with the idea of being more resilient.
Since we want to maximize the ingestion for package loaders.

ardumont commandeered this revision.Mon, Mar 23, 11:07 AM
ardumont added a reviewer: olasd.
ardumont updated this revision to Diff 10198.Mon, Mar 23, 11:08 AM

Build on the first iteration:

  • make test pass
  • add docstring
  • call sentry_sdk to push exception to sentry
ardumont edited the summary of this revision. (Show Details)Mon, Mar 23, 11:11 AM
ardumont updated this revision to Diff 10200.Mon, Mar 23, 11:16 AM

Remove spurious print statements

swh/loader/package/loader.py
16

I patched the staging node currently running the functional loader.
But i think we can add this anyway.

olasd requested changes to this revision.Mon, Mar 23, 11:25 AM
olasd added inline comments.
swh/loader/package/loader.py
378

I guess you can remove the loaded return value now; seems like the function always raises an exception when it fails to load now. This will simplify the caller as well.

This revision now requires changes to proceed.Mon, Mar 23, 11:25 AM
ardumont updated this revision to Diff 10201.Mon, Mar 23, 11:33 AM

Remove unneeded loaded flag from _load_revision

olasd accepted this revision.Mon, Mar 23, 11:34 AM
This revision is now accepted and ready to land.Mon, Mar 23, 11:34 AM
lewo added a subscriber: lewo.Mon, Mar 23, 11:48 AM
lewo added inline comments.
swh/loader/package/loader.py
358

I actually don't really like these except Exception statements. I think it would be better to first catch all expected errors, and then fallback on except Exception if it is a unexpected exception (and tag it as unexpected in sentry). For instance, we know some archives will be broken, this would lead to a "expected error".

This would allow us to write better tests, since we could test the type of the raised exception.

ardumont edited the test plan for this revision. (Show Details)Mon, Mar 23, 11:58 AM
ardumont added inline comments.Tue, Mar 24, 10:09 AM
swh/loader/package/loader.py
344

I knew this bothered me for some reason and then it slipped my mind [1]

[1] https://sentry.softwareheritage.org/share/issue/c0d1fbfcd865486bb41eea35552f50dd/

status visit can only be ongoing, partial or full...

ardumont added inline comments.Tue, Mar 24, 10:17 AM
swh/loader/package/loader.py
344

Fixed in D2870