Page MenuHomeSoftware Heritage

Fix crash when indexing two REMD objects from the same deposit
ClosedPublic

Authored by vlorentz on Nov 30 2022, 10:13 AM.

Details

Summary

The deduplication code assumed remd.target matches the id of results,
but this is no longer true, as we started using REMD objects whose
origin context was used as result id, when remd.target is a
directory (221d48e242520344c6435c07d444f961637497fe).

Resolves T4710.

Diff Detail

Repository
rDCIDX Metadata indexer
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D8905 (id=32093)

Could not rebase; Attempt merge onto f44e14b11f...

Updating f44e14b..f74b47b
Fast-forward
 swh/indexer/metadata.py                           |  6 ++--
 swh/indexer/metadata_dictionary/utils.py          |  6 +++-
 swh/indexer/tests/metadata_dictionary/test_npm.py | 11 +++++++
 swh/indexer/tests/test_metadata.py                | 35 ++++++++++++++++++++++-
 4 files changed, 53 insertions(+), 5 deletions(-)
Changes applied before test
commit f74b47bcf45ddf06c441d398511dbcad23af583f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 30 10:11:51 2022 +0100

    Fix crash when indexing two REMD objects from the same deposit
    
    The deduplication code assumed `remd.target` matches the id of results,
    but this is no longer true, as we started using REMD objects whose
    `origin` context was used as result id, when `remd.target` is a
    directory (221d48e242520344c6435c07d444f961637497fe).

commit b2d8afff6fa502173b31431b7129ad89d24b7048
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Nov 30 09:44:21 2022 +0100

    metadata_dictionary: Fix 'Invalid IPv6 URL' crash

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/534/ for more details.

olasd added a subscriber: olasd.

Thanks!

This revision is now accepted and ready to land.Nov 30 2022, 10:38 AM