By default, an extra query is sent to swh-indexer to (maybe) populate
the 'metadata' field, which is not used by the client, so it unnecessarily
increases latency to get results
Depends on D8843
Differential D8844
origin-search: Only request 'url' field vlorentz on Nov 15 2022, 10:04 AM. Authored by
Details
By default, an extra query is sent to swh-indexer to (maybe) populate Depends on D8843
Diff Detail
Event TimelineComment Actions Build has FAILED Patch application report for D8844 (id=31866)Could not rebase; Attempt merge onto ad8558c69d... Updating ad8558c6..a1c58db2 Fast-forward swh/web/api/tests/views/test_origin.py | 70 ++++++++++++++++++++++++++- swh/web/api/views/origin.py | 9 +++- swh/web/browse/assets/browse/origin-search.js | 3 ++ swh/web/utils/archive.py | 41 +++++++++++++--- 4 files changed, 113 insertions(+), 10 deletions(-) Changes applied before testcommit a1c58db2d5983abe4abb2ee4ea427aa04c687d6f Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Nov 15 10:03:38 2022 +0100 origin-search: Only request 'url' field By default, an extra query is sent to swh-indexer to (maybe) populate the 'metadata' field, which is not used by the client, so it unnecessarily increases latency to get results commit f59acd6185603e66c3418e4ffeac2106b2159300 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Nov 15 09:49:59 2022 +0100 metadata-search: Skip query to swh-indexer when its results would be discarded The 'fields' query parameter is used by clients to indicate what fields the API should return. If 'metadata' is not in that field, then the 'metadata' object will be discarded by apiresponse, so the call to `idx_storage.origin_intrinsic_metadata_get` is useless. I expect no client actually uses this field, so this could save ressources. Additionally, I want to deprecate the field, so this may make it easier to figure if any client actually requests it by looking at server logs. commit 76c64ea4dc9e158bdd5c2730340e04b213999609 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Nov 15 09:31:11 2022 +0100 metadata-search: Return swh-search even when missing from idx_storage.origin_intrinsic_metadata This is needed, because swh-search may now return results based on extrinsic metadata, in addition to intrinsic metadata. I do not want to query idx_storage.origin_extrinsic_metadata here, because it is not clear how to merge with the existing data structure. Additionally, I do not think anyone relies on the metadata returned by this endpoint because it is undocumented and rather inflexible. Instead, I would like to deprecate returning metadata from this endpoint altogether, as there is a more appropriate endpoint to get metadata once you have the origin URL. Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/2134/ Comment Actions Build has FAILED Patch application report for D8844 (id=31867)Could not rebase; Attempt merge onto ad8558c69d... Updating ad8558c6..8a8057eb Fast-forward cypress/e2e/origin-search.cy.js | 8 +-- swh/web/api/tests/views/test_origin.py | 70 ++++++++++++++++++++++++++- swh/web/api/views/origin.py | 9 +++- swh/web/browse/assets/browse/origin-search.js | 3 ++ swh/web/utils/archive.py | 41 +++++++++++++--- 5 files changed, 118 insertions(+), 13 deletions(-) Changes applied before testcommit 8a8057eb8ee5cdd314d739ab2dffff7da848f96c Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Nov 15 10:03:38 2022 +0100 origin-search: Only request 'url' field By default, an extra query is sent to swh-indexer to (maybe) populate the 'metadata' field, which is not used by the client, so it unnecessarily increases latency to get results commit f59acd6185603e66c3418e4ffeac2106b2159300 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Nov 15 09:49:59 2022 +0100 metadata-search: Skip query to swh-indexer when its results would be discarded The 'fields' query parameter is used by clients to indicate what fields the API should return. If 'metadata' is not in that field, then the 'metadata' object will be discarded by apiresponse, so the call to `idx_storage.origin_intrinsic_metadata_get` is useless. I expect no client actually uses this field, so this could save ressources. Additionally, I want to deprecate the field, so this may make it easier to figure if any client actually requests it by looking at server logs. commit 76c64ea4dc9e158bdd5c2730340e04b213999609 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Nov 15 09:31:11 2022 +0100 metadata-search: Return swh-search even when missing from idx_storage.origin_intrinsic_metadata This is needed, because swh-search may now return results based on extrinsic metadata, in addition to intrinsic metadata. I do not want to query idx_storage.origin_extrinsic_metadata here, because it is not clear how to merge with the existing data structure. Additionally, I do not think anyone relies on the metadata returned by this endpoint because it is undocumented and rather inflexible. Instead, I would like to deprecate returning metadata from this endpoint altogether, as there is a more appropriate endpoint to get metadata once you have the origin URL. Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/2135/ Comment Actions Build is green Patch application report for D8844 (id=31868)Could not rebase; Attempt merge onto ad8558c69d... Updating ad8558c6..6531a365 Fast-forward cypress/e2e/origin-search.cy.js | 8 +-- swh/web/api/tests/views/test_origin.py | 70 ++++++++++++++++++++++++++- swh/web/api/views/origin.py | 9 +++- swh/web/browse/assets/browse/origin-search.js | 3 ++ swh/web/utils/archive.py | 41 +++++++++++++--- 5 files changed, 118 insertions(+), 13 deletions(-) Changes applied before testcommit 6531a3653102f017d80af868dadf8d6ddaad630c Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Nov 15 10:03:38 2022 +0100 origin-search: Only request 'url' field By default, an extra query is sent to swh-indexer to (maybe) populate the 'metadata' field, which is not used by the client, so it unnecessarily increases latency to get results commit f59acd6185603e66c3418e4ffeac2106b2159300 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Nov 15 09:49:59 2022 +0100 metadata-search: Skip query to swh-indexer when its results would be discarded The 'fields' query parameter is used by clients to indicate what fields the API should return. If 'metadata' is not in that field, then the 'metadata' object will be discarded by apiresponse, so the call to `idx_storage.origin_intrinsic_metadata_get` is useless. I expect no client actually uses this field, so this could save ressources. Additionally, I want to deprecate the field, so this may make it easier to figure if any client actually requests it by looking at server logs. commit 76c64ea4dc9e158bdd5c2730340e04b213999609 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Nov 15 09:31:11 2022 +0100 metadata-search: Return swh-search even when missing from idx_storage.origin_intrinsic_metadata This is needed, because swh-search may now return results based on extrinsic metadata, in addition to intrinsic metadata. I do not want to query idx_storage.origin_extrinsic_metadata here, because it is not clear how to merge with the existing data structure. Additionally, I do not think anyone relies on the metadata returned by this endpoint because it is undocumented and rather inflexible. Instead, I would like to deprecate returning metadata from this endpoint altogether, as there is a more appropriate endpoint to get metadata once you have the origin URL. See https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/2136/ for more details. Comment Actions not really a nice catch as it wasn't a very useful optimization before D8843, which I only noticed when the useless query caused issues ;) |