Page MenuHomeSoftware Heritage

Fix crash when RawExtrinsicMetadata target new origins
ClosedPublic

Authored by vlorentz on Sep 5 2022, 3:51 PM.

Details

Summary

RawExtrinsicMetadata contain a swh:1:ori: identifier of the origin,
which the indexer needs to resolve, by querying its storage replica.

Because RawExtrinsicMetadata are created by loaders, they are often
created shortly after the origin is created by the corresponding lister,
so the origin may not be known to the storage replica used by the
indexer, causing this function to crash.

Waiting 10s seems to be good enough when run on my computer with
production data and moma's replica; so I set it to 60s just to be safe.

Diff Detail

Repository
rDCIDX Metadata indexer
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D8392 (id=30285)

Could not rebase; Attempt merge onto 44879ab563...

Updating 44879ab..befdbd7
Fast-forward
 swh/indexer/metadata.py            | 34 ++++++++++++++++++++++++----------
 swh/indexer/tests/test_metadata.py | 29 +++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+), 10 deletions(-)
Changes applied before test
commit befdbd7efd46dd052b8728215deeeb3f775c34d0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Sep 5 15:48:40 2022 +0200

    Fix crash when RawExtrinsicMetadata target new origins
    
    RawExtrinsicMetadata contain a swh:1:ori: identifier of the origin,
    which the indexer needs to resolve, by querying its storage replica.
    
    Because RawExtrinsicMetadata are created by loaders, they are often
    created shortly after the origin is created by the corresponding lister,
    so the origin may not be known to the storage replica used by the
    indexer, causing this function to crash.
    
    Waiting 10s seems to be good enough when run on my computer with
    production data and moma's replica; so I set it to 60s just to be safe.

commit 68940cfccfed258620cc116bedd6598fd9b28df4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Sep 5 14:21:50 2022 +0200

    Fix crash when RawExtrinsicMetadata objects have the same target
    
    ... and they are processed in the same batch.
    
    The last one received takes precedence, as it is likely to be more
    up-to-date

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/481/ for more details.

ardumont added a subscriber: ardumont.
ardumont added inline comments.
swh/indexer/metadata.py
152

(not right now) It may be interesting to start using a similar retry scaffolding we have in lister and loader at some point.

This revision is now accepted and ready to land.Sep 8 2022, 4:47 PM