Page MenuHomeSoftware Heritage

metadata-search: Return swh-search even when missing from idx_storage.origin_intrinsic_metadata
ClosedPublic

Authored by vlorentz on Tue, Nov 15, 9:31 AM.

Details

Summary

This is needed, because swh-search may now return results based on extrinsic metadata,
in addition to intrinsic metadata.

I do not want to query idx_storage.origin_extrinsic_metadata here, because it
is not clear how to merge with the existing data structure.

Additionally, I do not think anyone relies on the metadata returned by this
endpoint because it is undocumented and rather inflexible. Instead, I would
like to deprecate returning metadata from this endpoint altogether, as there
is a more appropriate endpoint to get metadata once you have the origin URL.

Resolves T4599.

Diff Detail

Repository
rDWAPPS Web applications
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D8842 (id=31863)

Rebasing onto ad8558c69d...

Current branch diff-target is up to date.
Changes applied before test
commit 76c64ea4dc9e158bdd5c2730340e04b213999609
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Nov 15 09:31:11 2022 +0100

    metadata-search: Return swh-search even when missing from idx_storage.origin_intrinsic_metadata
    
    This is needed, because swh-search may now return results based on extrinsic metadata,
    in addition to intrinsic metadata.
    
    I do not want to query idx_storage.origin_extrinsic_metadata here, because it
    is not clear how to merge with the existing data structure.
    
    Additionally, I do not think anyone relies on the metadata returned by this
    endpoint because it is undocumented and rather inflexible. Instead, I would
    like to deprecate returning metadata from this endpoint altogether, as there
    is a more appropriate endpoint to get metadata once you have the origin URL.

See https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/2132/ for more details.

ardumont added a subscriber: ardumont.
ardumont added inline comments.
swh/web/utils/archive.py
371

I gather metadata is a dict with keys all urls present in origin_urls (be there results or not from the indexer storage call line 362), right?

(I'm trying to determine whether metadata[url] can raise KeyError somehow).

This revision is now accepted and ready to land.Tue, Nov 15, 11:00 AM
swh/web/utils/archive.py
371

I think you missed the ternary, which checks the key is in the dict before accessing it

swh/web/utils/archive.py
371

right, thx!