Page MenuHomeSoftware Heritage

metadata-search: Skip query to swh-indexer when its results would be discarded
ClosedPublic

Authored by vlorentz on Nov 15 2022, 9:53 AM.

Details

Summary

The 'fields' query parameter is used by clients to indicate what fields the
API should return.

If 'metadata' is not in that field, then the 'metadata' object will be
discarded by apiresponse, so the call to
idx_storage.origin_intrinsic_metadata_get is useless.

I expect no client actually uses this field, so this could save
ressources.

Additionally, I want to deprecate the field, so this may make it easier
to figure if any client actually requests it by looking at server logs.

Depends on D8842.

Diff Detail

Repository
rDWAPPS Web applications
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D8843 (id=31864)

Could not rebase; Attempt merge onto ad8558c69d...

Updating ad8558c6..f59acd61
Fast-forward
 swh/web/api/tests/views/test_origin.py | 70 +++++++++++++++++++++++++++++++++-
 swh/web/api/views/origin.py            |  9 ++++-
 swh/web/utils/archive.py               | 41 ++++++++++++++++----
 3 files changed, 110 insertions(+), 10 deletions(-)
Changes applied before test
commit f59acd6185603e66c3418e4ffeac2106b2159300
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Nov 15 09:49:59 2022 +0100

    metadata-search: Skip query to swh-indexer when its results would be discarded
    
    The 'fields' query parameter is used by clients to indicate what fields the
    API should return.
    
    If 'metadata' is not in that field, then the 'metadata' object will be
    discarded by apiresponse, so the call to
    `idx_storage.origin_intrinsic_metadata_get` is useless.
    
    I expect no client actually uses this field, so this could save
    ressources.
    
    Additionally, I want to deprecate the field, so this may make it easier
    to figure if any client actually requests it by looking at server logs.

commit 76c64ea4dc9e158bdd5c2730340e04b213999609
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Nov 15 09:31:11 2022 +0100

    metadata-search: Return swh-search even when missing from idx_storage.origin_intrinsic_metadata
    
    This is needed, because swh-search may now return results based on extrinsic metadata,
    in addition to intrinsic metadata.
    
    I do not want to query idx_storage.origin_extrinsic_metadata here, because it
    is not clear how to merge with the existing data structure.
    
    Additionally, I do not think anyone relies on the metadata returned by this
    endpoint because it is undocumented and rather inflexible. Instead, I would
    like to deprecate returning metadata from this endpoint altogether, as there
    is a more appropriate endpoint to get metadata once you have the origin URL.

See https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/2133/ for more details.

This revision is now accepted and ready to land.Nov 15 2022, 11:04 AM