Page MenuHomeSoftware Heritage
Paste P966

Snapshot branches queries performances
ActivePublic

Authored by anlambert on Mar 3 2021, 2:07 PM.
Executing the snapshot_get_branches query on db.internal.softwareheritage.org gives unexpected
performance results:
- limiting the query to 10 results is fast and responsive
- limiting the query to 2 results does not return and timeouts in swh-storage
13:59 $ psql service=swh
psql (13.2 (Debian 13.2-1.pgdg100+1), server 12.5 (Debian 12.5-1.pgdg100+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.
softwareheritage=> explain with filtered_snapshot_branches as (
select '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea as snapshot_id, name, target, target_type
from snapshot_branches
inner join snapshot_branch on snapshot_branches.branch_id = snapshot_branch.object_id
where snapshot_id = (select object_id from snapshot where snapshot.id = '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea)
and (NULL :: snapshot_target[] is null or target_type = any(NULL :: snapshot_target[]))
order by name
)
select snapshot_id, name, target, target_type
from filtered_snapshot_branches
where name >= '\x'::bytea
and (NULL is null or convert_from(name, 'utf-8') ilike NULL)
and (NULL is null or convert_from(name, 'utf-8') not ilike NULL)
order by name limit 10;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=74186.60..74187.87 rows=10 width=76)
-> Gather Merge (cost=74186.60..79886.87 rows=48856 width=76)
Workers Planned: 2
Params Evaluated: $0
InitPlan 1 (returns $0)
-> Index Scan using snapshot_id_idx on snapshot (cost=0.57..2.59 rows=1 width=8)
Index Cond: ((id)::bytea = '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea)
-> Sort (cost=73183.99..73245.06 rows=24428 width=76)
Sort Key: snapshot_branch.name
-> Nested Loop (cost=1.28..71403.65 rows=24428 width=76)
-> Parallel Index Only Scan using snapshot_branches_pkey on snapshot_branches (cost=0.70..8930.74 rows=24428 width=8)
Index Cond: (snapshot_id = $0)
-> Index Scan using snapshot_branch_pkey on snapshot_branch (cost=0.57..2.56 rows=1 width=52)
Index Cond: (object_id = snapshot_branches.branch_id)
Filter: (name >= '\x'::bytea)
(15 rows)
softwareheritage=> explain with filtered_snapshot_branches as (
select '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea as snapshot_id, name, target, target_type
from snapshot_branches
inner join snapshot_branch on snapshot_branches.branch_id = snapshot_branch.object_id
where snapshot_id = (select object_id from snapshot where snapshot.id = '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea)
and (NULL :: snapshot_target[] is null or target_type = any(NULL :: snapshot_target[]))
order by name
)
select snapshot_id, name, target, target_type
from filtered_snapshot_branches
where name >= '\x'::bytea
and (NULL is null or convert_from(name, 'utf-8') ilike NULL)
and (NULL is null or convert_from(name, 'utf-8') not ilike NULL)
order by name limit 2;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=3.99..15957.84 rows=2 width=76)
-> Nested Loop (cost=3.99..389720167.57 rows=58626 width=76)
InitPlan 1 (returns $0)
-> Index Scan using snapshot_id_idx on snapshot (cost=0.57..2.59 rows=1 width=8)
Index Cond: ((id)::bytea = '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea)
-> Index Scan using snapshot_branch_name_target_target_type_idx on snapshot_branch (cost=0.70..14780333.06 rows=371573621 width=52)
Index Cond: (name >= '\x'::bytea)
-> Index Only Scan using snapshot_branches_pkey on snapshot_branches (cost=0.70..1.01 rows=1 width=8)
Index Cond: ((snapshot_id = $0) AND (branch_id = snapshot_branch.object_id))
(9 rows)
On replica db (somerset.internal.softwareheritage.org), performances are great for both queries.
14:03 $ psql service=swh-replica
psql (13.2 (Debian 13.2-1.pgdg100+1), server 13.1 (Debian 13.1-1.pgdg100+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.
softwareheritage=> explain with filtered_snapshot_branches as (
select '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea as snapshot_id, name, target, target_type
from snapshot_branches
inner join snapshot_branch on snapshot_branches.branch_id = snapshot_branch.object_id
where snapshot_id = (select object_id from snapshot where snapshot.id = '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea)
and (NULL :: snapshot_target[] is null or target_type = any(NULL :: snapshot_target[]))
order by name
)
select snapshot_id, name, target, target_type
from filtered_snapshot_branches
where name >= '\x'::bytea
and (NULL is null or convert_from(name, 'utf-8') ilike NULL)
and (NULL is null or convert_from(name, 'utf-8') not ilike NULL)
order by name limit 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=190193.69..190194.96 rows=10 width=76)
-> Gather Merge (cost=190193.69..195734.11 rows=47486 width=76)
Workers Planned: 2
Params Evaluated: $0
InitPlan 1 (returns $0)
-> Index Scan using snapshot_id_idx on snapshot (cost=0.57..4.59 rows=1 width=8)
Index Cond: ((id)::bytea = '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea)
-> Sort (cost=189189.08..189248.44 rows=23743 width=76)
Sort Key: snapshot_branch.name
-> Nested Loop (cost=1.28..187463.53 rows=23743 width=76)
-> Parallel Index Only Scan using snapshot_branches_pkey on snapshot_branches (cost=0.70..80052.98 rows=23745 width=8)
Index Cond: (snapshot_id = $0)
-> Index Scan using snapshot_branch_pkey on snapshot_branch (cost=0.57..4.52 rows=1 width=52)
Index Cond: (object_id = snapshot_branches.branch_id)
Filter: (name >= '\x'::bytea)
JIT:
Functions: 13
Options: Inlining false, Optimization false, Expressions true, Deforming true
(18 rows)
softwareheritage=> explain with filtered_snapshot_branches as (
select '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea as snapshot_id, name, target, target_type
from snapshot_branches
inner join snapshot_branch on snapshot_branches.branch_id = snapshot_branch.object_id
where snapshot_id = (select object_id from snapshot where snapshot.id = '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea)
and (NULL :: snapshot_target[] is null or target_type = any(NULL :: snapshot_target[]))
order by name
)
select snapshot_id, name, target, target_type
from filtered_snapshot_branches
where name >= '\x'::bytea
and (NULL is null or convert_from(name, 'utf-8') ilike NULL)
and (NULL is null or convert_from(name, 'utf-8') not ilike NULL)
order by name limit 2;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=190193.69..190193.94 rows=2 width=76)
-> Gather Merge (cost=190193.69..195734.11 rows=47486 width=76)
Workers Planned: 2
Params Evaluated: $0
InitPlan 1 (returns $0)
-> Index Scan using snapshot_id_idx on snapshot (cost=0.57..4.59 rows=1 width=8)
Index Cond: ((id)::bytea = '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea)
-> Sort (cost=189189.08..189248.44 rows=23743 width=76)
Sort Key: snapshot_branch.name
-> Nested Loop (cost=1.28..187463.53 rows=23743 width=76)
-> Parallel Index Only Scan using snapshot_branches_pkey on snapshot_branches (cost=0.70..80052.98 rows=23745 width=8)
Index Cond: (snapshot_id = $0)
-> Index Scan using snapshot_branch_pkey on snapshot_branch (cost=0.57..4.52 rows=1 width=52)
Index Cond: (object_id = snapshot_branches.branch_id)
Filter: (name >= '\x'::bytea)
JIT:
Functions: 13
Options: Inlining false, Optimization false, Expressions true, Deforming true
(18 rows)

Event Timeline

It seems there are some differences in term of indexes between the main and replica databases.
On the replica, only the primary keys are present on the snapshot_branches and the snapshot_branch tables. Perhaps the query optimizer is confused by something and is doing a wrong choice somewhere.

The limit seems to be 10. The response are fast for any limit >=10 but slow when <10....

limit 2:
https://explain.depesz.com/s/UW9Z

softwareheritage=> explain analyze with filtered_snapshot_branches as (
  select '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea as snapshot_id, name, target, target_type
  from snapshot_branches
  inner join snapshot_branch on snapshot_branches.branch_id = snapshot_branch.object_id
  where snapshot_id = (select object_id from snapshot where snapshot.id = '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea)
    and (NULL :: snapshot_target[] is null or target_type = any(NULL :: snapshot_target[]))
)
select snapshot_id, name, target, target_type
from filtered_snapshot_branches
where name >= '\x'::bytea
  and (NULL is null or convert_from(name, 'utf-8') ilike NULL)
  and (NULL is null or convert_from(name, 'utf-8') not ilike NULL)
order by name limit 2;
                                                                                                    QUERY PLAN                                                                                                     
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1004.01..6764.11 rows=2 width=76) (actual time=172523.081..173555.673 rows=2 loops=1)
   InitPlan 1 (returns $0)
     ->  Index Scan using snapshot_id_idx on snapshot  (cost=0.57..2.59 rows=1 width=8) (actual time=0.028..0.036 rows=1 loops=1)
           Index Cond: ((id)::bytea = '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea)
   ->  Gather Merge  (cost=1001.43..168852423.27 rows=58628 width=76) (actual time=172523.079..173555.661 rows=2 loops=1)
         Workers Planned: 2
         Params Evaluated: $0
         Workers Launched: 2
         ->  Nested Loop  (cost=1.40..168844656.12 rows=24428 width=76) (actual time=126442.320..167761.276 rows=2 loops=3)
               ->  Parallel Index Scan using snapshot_branch_name_target_target_type_idx on snapshot_branch  (cost=0.70..12612971.47 rows=154824599 width=52) (actual time=0.077..80926.811 rows=23123612 loops=3)
                     Index Cond: (name >= '\x'::bytea)
               ->  Index Only Scan using snapshot_branches_pkey on snapshot_branches  (cost=0.70..1.01 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=69370837)
                     Index Cond: ((snapshot_id = $0) AND (branch_id = snapshot_branch.object_id))
                     Heap Fetches: 5
 Planning Time: 0.993 ms
 Execution Time: 173555.864 ms
(16 rows)

Time: 173667.434 ms (02:53.667)

limit 10:
https://explain.depesz.com/s/NbSU

softwareheritage=> explain analyze with filtered_snapshot_branches as (
  select '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea as snapshot_id, name, target, target_type
  from snapshot_branches
  inner join snapshot_branch on snapshot_branches.branch_id = snapshot_branch.object_id
  where snapshot_id = (select object_id from snapshot where snapshot.id = '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea)
    and (NULL :: snapshot_target[] is null or target_type = any(NULL :: snapshot_target[]))
)
select snapshot_id, name, target, target_type
from filtered_snapshot_branches
where name >= '\x'::bytea
  and (NULL is null or convert_from(name, 'utf-8') ilike NULL)
  and (NULL is null or convert_from(name, 'utf-8') not ilike NULL)
order by name limit 10;
Time: 895976.065 ms (14:55.976)
                                                                                                   QUERY PLAN                                                                                                     
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1004.01..29804.51 rows=10 width=76) (actual time=148561.091..895901.340 rows=10 loops=1)
   InitPlan 1 (returns $0)
     ->  Index Scan using snapshot_id_idx on snapshot  (cost=0.57..2.59 rows=1 width=8) (actual time=0.032..0.038 rows=1 loops=1)
           Index Cond: ((id)::bytea = '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea)
   ->  Gather Merge  (cost=1001.43..168852537.02 rows=58628 width=76) (actual time=148561.089..895901.327 rows=10 loops=1)
         Workers Planned: 2
         Params Evaluated: $0
         Workers Launched: 2
         ->  Nested Loop  (cost=1.40..168844769.87 rows=24428 width=76) (actual time=112742.855..633535.913 rows=5 loops=3)
               ->  Parallel Index Scan using snapshot_branch_name_target_target_type_idx on snapshot_branch  (cost=0.70..12612971.47 rows=154824599 width=52) (actual time=0.042..360975.705 rows=78257338 loops=3)
                     Index Cond: (name >= '\x'::bytea)
               ->  Index Only Scan using snapshot_branches_pkey on snapshot_branches  (cost=0.70..1.01 rows=1 width=8) (actual time=0.003..0.003 rows=0 loops=234772013)
                     Index Cond: ((snapshot_id = $0) AND (branch_id = snapshot_branch.object_id))
                     Heap Fetches: 15
 Planning Time: 7.092 ms
 Execution Time: 895901.535 ms
(16 rows)

The query has cleary not the same impact with a limit of 10 or 2

What I do not understand is why the timeout does not occur in production webapp as the query with limit 2 is executed while browsing an origin to resolve the HEAD branch alias.

To be noted, that query is wrapped in a stored PGSQL procedure (swh_snapshot_get_by_id) so this may be related.

yep it's weird, but after looking at the code of the function, I realized it seems to be a known problem :
https://forge.softwareheritage.org/source/swh-storage/browse/master/swh/storage/sql/40-funcs.sql$665-667

so yes indeed, perhaps increasing the minimum limit as you suggested can do the job

Ack. I guess I will handle that directly in the swh.storage.postgresql.storage.snapshot_get_branches and ensures a minimum of 10 for the limit value.

I have created D5191 to ensure a minimum limit value of 10.