Page MenuHomeSoftware Heritage

postgresql: Ensure a minimum limit for the snapshot branches query
ClosedPublic

Authored by anlambert on Mar 3 2021, 4:27 PM.

Details

Summary

With small limits (< 10), the snapshot branches query can degenerate into
using the deduplication index on snapshot_branch (name, target, target_type),
and the postgresql planner happily scans several hundred million rows.

So ensure a minimum limit value of 10 before executing the query for
optimal performances when a small branches_count value is provided
to the snapshot_get_branches method of the Storage interface.

Related to P966

Test Plan

Testing the snapshot_get_branches method with small branches_count
values is already done in current tests.

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D5191 (id=18564)

Rebasing onto f46244b57e...

Current branch diff-target is up to date.
Changes applied before test
commit d9dc5722a390fedefdddc832ab36777526c63351
Author: Antoine Lambert <antoine.lambert@inria.fr>
Date:   Wed Mar 3 16:20:39 2021 +0100

    postgresql: Ensure a minimum limit for the snapshot branches query
    
    With small limits (< 10), the snapshot branches query can degenerate into
    using the deduplication index on snapshot_branch (name, target, target_type),
    and the postgresql planner happily scans several hundred million rows.
    
    So ensure a minimum limit value of 10 before executing the query for
    optimal performances when a small branches_count value is provided
    to the snapshot_get_branches method of the Storage interface.
    
    Related to P966

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1171/ for more details.

ardumont added a subscriber: ardumont.

lgtm

one question inline though

swh/storage/postgresql/storage.py
784

for pagination consistency's sake?

This revision is now accepted and ready to land.Mar 3 2021, 4:33 PM
swh/storage/postgresql/storage.py
784

What do you mean ?

Build is green

Patch application report for D5191 (id=18568)

Rebasing onto ce8335db96...

Current branch diff-target is up to date.
Changes applied before test
commit 88ff2c2fa0ec61310071ea40b83079fb333aac99
Author: Antoine Lambert <antoine.lambert@inria.fr>
Date:   Wed Mar 3 16:20:39 2021 +0100

    postgresql: Ensure a minimum limit for the snapshot branches query
    
    With small limits (< 10), the snapshot branches query can degenerate into
    using the deduplication index on snapshot_branch (name, target, target_type),
    and the postgresql planner happily scans several hundred million rows.
    
    So ensure a minimum limit value of 10 before executing the query for
    optimal performances when a small branches_count value is provided
    to the snapshot_get_branches method of the Storage interface.
    
    Related to P966

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1173/ for more details.

swh/storage/postgresql/storage.py
784

nvm