Page MenuHomeSoftware Heritage

journal.replay: Batch insert contents/skipped_contents in storage backend
ClosedPublic

Authored by ardumont on Mar 6 2020, 10:48 AM.

Diff Detail

Repository
rDJNL Journal infrastructure
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ardumont created this revision.Mar 6 2020, 10:48 AM
ardumont added inline comments.Mar 6 2020, 10:51 AM
swh/journal/replay.py
250

Do we add a fallback behavior to ensure other objects from the failing transaction are written nonetheless?

Sounds sensible to do so (in the mirror context).

Maybe we should evolve the hash collision exception to refeerence the content in error.
That'd make it easier to fix client side the behavior if we had the culprit immediately in sight.

ardumont added inline comments.Mar 6 2020, 10:56 AM
swh/journal/replay.py
250

Maybe we should evolve the hash collision exception to refeerence the content in error.

Right now we have HashCollision 'type' of hash collision as message (e.g. sha1)... [1]
That's not helpful enough.

[1] https://sentry.softwareheritage.org/share/issue/5a6f8c09fea1468997bb88ca3e19fc2d/

douardda added inline comments.Mar 6 2020, 11:54 AM
swh/journal/replay.py
235–236

are these vars necessary?

Why not simply use (with proper exception handling stuff):

storage.skipped_content_add(
              (obj for obj in objects if obj.get('status') == 'absent'))
storage.content_add_metadata(
              (obj for obj in objects if obj.get('status') != 'absent'))
douardda accepted this revision.Mar 6 2020, 11:54 AM

ok (besides my remark).

This revision is now accepted and ready to land.Mar 6 2020, 11:54 AM
ardumont added inline comments.Mar 6 2020, 11:57 AM
swh/journal/replay.py
235–236

Depends if we want to act upon my question below or not i guess ;)

douardda added inline comments.Mar 6 2020, 1:40 PM
swh/journal/replay.py
235–236

Well one thing at a time. Let's deal with this as is, then discuss how to improve cases of ingestion error...

ardumont updated this revision to Diff 9882.Mar 6 2020, 1:50 PM

As a first step, no need for further check so no need for temporary variables

This revision was automatically updated to reflect the committed changes.