Page MenuHomeSoftware Heritage

origin_visit_get_latest: Materialize subquery on 'origin' table.
ClosedPublic

Authored by vlorentz on Mar 11 2022, 1:36 PM.

Details

Summary

postgresql's query planner does not understand the origin is unique, so it performs
a partial index scan on origin_visit_pkey, which is inefficient on origins with
many visits.

This commit itself is not enough to make it use the proper index,
but provides this necessary change that will be used by a future commit.

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D7338 (id=26531)

Could not rebase; Attempt merge onto 4e78014b07...

Updating 4e78014b..600e87fb
Fast-forward
 swh/storage/postgresql/db.py      |  4 +++-
 swh/storage/postgresql/storage.py | 12 ++++++------
 2 files changed, 9 insertions(+), 7 deletions(-)
Changes applied before test
commit 600e87fb2351736c9e05ffb3e3d9a1641a00b846
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Mar 11 13:36:07 2022 +0100

    origin_visit_get_latest: Materialize subquery on 'origin' table.
    
    postgresql's query planner does not understand the origin is unique, so it performs
    a partial index scan on origin_visit_pkey, which is inefficient on origins with
    many visits.
    
    This commit itself is not enough to make it use the proper index,
    but provides this necessary change that will be used by a future commit.

commit b0cdab58b71b25e07d40dc022a5f0b0c77298c3f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Mar 11 11:36:01 2022 +0100

    postgresql: Increase timeouts that often fail
    
    According to Sentry, in the last 30 days:
    
    * directory_entry_get_by_path: 958 events, https://sentry.softwareheritage.org/share/issue/c4c2124953a145b2bd325f6f6b7df5a6/
    * revision_get: 841 events, https://sentry.softwareheritage.org/share/issue/55fbe01c6f4d4c9bbf684c7608a62ad9/
    * release_get: 14 events, https://sentry.softwareheritage.org/share/issue/37c53354541b4c4eaa1faf4e20a68418/
    * origin_visit_find_by_date: 114 events, https://sentry.softwareheritage.org/share/issue/a674c12049a941968a717661a0226559/
    * origin_get: 79 events, https://sentry.softwareheritage.org/share/issue/bf21d6bc7b24442eb18643d80d936d27/ ; 67 events, https://sentry.softwareheritage.org/share/issue/010a4b1e085a4e2089ba4897c6de6038/

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1567/ for more details.

This revision is now accepted and ready to land.Mar 11 2022, 2:00 PM