Snapshot branches queries performances
ActivePublic
Actions

Authored by anlambert on Mar 3 2021, 2:07 PM.

Tags

None

Subscribers

	Executing the snapshot_get_branches query on db.internal.softwareheritage.org gives unexpected
	performance results:

	- limiting the query to 10 results is fast and responsive

	- limiting the query to 2 results does not return and timeouts in swh-storage

	13:59 $ psql service=swh
	psql (13.2 (Debian 13.2-1.pgdg100+1), server 12.5 (Debian 12.5-1.pgdg100+1))
	SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
	Type "help" for help.

	softwareheritage=> explain with filtered_snapshot_branches as (
	select '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea as snapshot_id, name, target, target_type
	from snapshot_branches
	inner join snapshot_branch on snapshot_branches.branch_id = snapshot_branch.object_id
	where snapshot_id = (select object_id from snapshot where snapshot.id = '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea)
	and (NULL :: snapshot_target[] is null or target_type = any(NULL :: snapshot_target[]))
	order by name
	)
	select snapshot_id, name, target, target_type
	from filtered_snapshot_branches
	where name >= '\x'::bytea
	and (NULL is null or convert_from(name, 'utf-8') ilike NULL)
	and (NULL is null or convert_from(name, 'utf-8') not ilike NULL)
	order by name limit 10;
	QUERY PLAN
	---------------------------------------------------------------------------------------------------------------------------------------------
	Limit (cost=74186.60..74187.87 rows=10 width=76)
	-> Gather Merge (cost=74186.60..79886.87 rows=48856 width=76)
	Workers Planned: 2
	Params Evaluated: $0
	InitPlan 1 (returns $0)
	-> Index Scan using snapshot_id_idx on snapshot (cost=0.57..2.59 rows=1 width=8)
	Index Cond: ((id)::bytea = '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea)
	-> Sort (cost=73183.99..73245.06 rows=24428 width=76)
	Sort Key: snapshot_branch.name
	-> Nested Loop (cost=1.28..71403.65 rows=24428 width=76)
	-> Parallel Index Only Scan using snapshot_branches_pkey on snapshot_branches (cost=0.70..8930.74 rows=24428 width=8)
	Index Cond: (snapshot_id = $0)
	-> Index Scan using snapshot_branch_pkey on snapshot_branch (cost=0.57..2.56 rows=1 width=52)
	Index Cond: (object_id = snapshot_branches.branch_id)
	Filter: (name >= '\x'::bytea)
	(15 rows)

	softwareheritage=> explain with filtered_snapshot_branches as (
	select '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea as snapshot_id, name, target, target_type
	from snapshot_branches
	inner join snapshot_branch on snapshot_branches.branch_id = snapshot_branch.object_id
	where snapshot_id = (select object_id from snapshot where snapshot.id = '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea)
	and (NULL :: snapshot_target[] is null or target_type = any(NULL :: snapshot_target[]))
	order by name
	)
	select snapshot_id, name, target, target_type
	from filtered_snapshot_branches
	where name >= '\x'::bytea
	and (NULL is null or convert_from(name, 'utf-8') ilike NULL)
	and (NULL is null or convert_from(name, 'utf-8') not ilike NULL)
	order by name limit 2;
	QUERY PLAN
	-----------------------------------------------------------------------------------------------------------------------------------------------
	Limit (cost=3.99..15957.84 rows=2 width=76)
	-> Nested Loop (cost=3.99..389720167.57 rows=58626 width=76)
	InitPlan 1 (returns $0)
	-> Index Scan using snapshot_id_idx on snapshot (cost=0.57..2.59 rows=1 width=8)
	Index Cond: ((id)::bytea = '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea)
	-> Index Scan using snapshot_branch_name_target_target_type_idx on snapshot_branch (cost=0.70..14780333.06 rows=371573621 width=52)
	Index Cond: (name >= '\x'::bytea)
	-> Index Only Scan using snapshot_branches_pkey on snapshot_branches (cost=0.70..1.01 rows=1 width=8)
	Index Cond: ((snapshot_id = $0) AND (branch_id = snapshot_branch.object_id))
	(9 rows)

	On replica db (somerset.internal.softwareheritage.org), performances are great for both queries.

	14:03 $ psql service=swh-replica
	psql (13.2 (Debian 13.2-1.pgdg100+1), server 13.1 (Debian 13.1-1.pgdg100+1))
	SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
	Type "help" for help.

	softwareheritage=> explain with filtered_snapshot_branches as (
	select '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea as snapshot_id, name, target, target_type
	from snapshot_branches
	inner join snapshot_branch on snapshot_branches.branch_id = snapshot_branch.object_id
	where snapshot_id = (select object_id from snapshot where snapshot.id = '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea)
	and (NULL :: snapshot_target[] is null or target_type = any(NULL :: snapshot_target[]))
	order by name
	)
	select snapshot_id, name, target, target_type
	from filtered_snapshot_branches
	where name >= '\x'::bytea
	and (NULL is null or convert_from(name, 'utf-8') ilike NULL)
	and (NULL is null or convert_from(name, 'utf-8') not ilike NULL)
	order by name limit 10;
	QUERY PLAN
	----------------------------------------------------------------------------------------------------------------------------------------------
	Limit (cost=190193.69..190194.96 rows=10 width=76)
	-> Gather Merge (cost=190193.69..195734.11 rows=47486 width=76)
	Workers Planned: 2
	Params Evaluated: $0
	InitPlan 1 (returns $0)
	-> Index Scan using snapshot_id_idx on snapshot (cost=0.57..4.59 rows=1 width=8)
	Index Cond: ((id)::bytea = '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea)
	-> Sort (cost=189189.08..189248.44 rows=23743 width=76)
	Sort Key: snapshot_branch.name
	-> Nested Loop (cost=1.28..187463.53 rows=23743 width=76)
	-> Parallel Index Only Scan using snapshot_branches_pkey on snapshot_branches (cost=0.70..80052.98 rows=23745 width=8)
	Index Cond: (snapshot_id = $0)
	-> Index Scan using snapshot_branch_pkey on snapshot_branch (cost=0.57..4.52 rows=1 width=52)
	Index Cond: (object_id = snapshot_branches.branch_id)
	Filter: (name >= '\x'::bytea)
	JIT:
	Functions: 13
	Options: Inlining false, Optimization false, Expressions true, Deforming true
	(18 rows)

	softwareheritage=> explain with filtered_snapshot_branches as (
	select '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea as snapshot_id, name, target, target_type
	from snapshot_branches
	inner join snapshot_branch on snapshot_branches.branch_id = snapshot_branch.object_id
	where snapshot_id = (select object_id from snapshot where snapshot.id = '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea)
	and (NULL :: snapshot_target[] is null or target_type = any(NULL :: snapshot_target[]))
	order by name
	)
	select snapshot_id, name, target, target_type
	from filtered_snapshot_branches
	where name >= '\x'::bytea
	and (NULL is null or convert_from(name, 'utf-8') ilike NULL)
	and (NULL is null or convert_from(name, 'utf-8') not ilike NULL)
	order by name limit 2;
	QUERY PLAN
	----------------------------------------------------------------------------------------------------------------------------------------------
	Limit (cost=190193.69..190193.94 rows=2 width=76)
	-> Gather Merge (cost=190193.69..195734.11 rows=47486 width=76)
	Workers Planned: 2
	Params Evaluated: $0
	InitPlan 1 (returns $0)
	-> Index Scan using snapshot_id_idx on snapshot (cost=0.57..4.59 rows=1 width=8)
	Index Cond: ((id)::bytea = '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea)
	-> Sort (cost=189189.08..189248.44 rows=23743 width=76)
	Sort Key: snapshot_branch.name
	-> Nested Loop (cost=1.28..187463.53 rows=23743 width=76)
	-> Parallel Index Only Scan using snapshot_branches_pkey on snapshot_branches (cost=0.70..80052.98 rows=23745 width=8)
	Index Cond: (snapshot_id = $0)
	-> Index Scan using snapshot_branch_pkey on snapshot_branch (cost=0.57..4.52 rows=1 width=52)
	Index Cond: (object_id = snapshot_branches.branch_id)
	Filter: (name >= '\x'::bytea)
	JIT:
	Functions: 13
	Options: Inlining false, Optimization false, Expressions true, Deforming true
	(18 rows)

Event Timeline

anlambert created this paste.Mar 3 2021, 2:07 PM

It seems there are some differences in term of indexes between the main and replica databases.
On the replica, only the primary keys are present on the snapshot_branches and the snapshot_branch tables. Perhaps the query optimizer is confused by something and is doing a wrong choice somewhere.

The limit seems to be 10. The response are fast for any limit >=10 but slow when <10....

limit 2:
https://explain.depesz.com/s/UW9Z

softwareheritage=> explain analyze with filtered_snapshot_branches as (
  select '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea as snapshot_id, name, target, target_type
  from snapshot_branches
  inner join snapshot_branch on snapshot_branches.branch_id = snapshot_branch.object_id
  where snapshot_id = (select object_id from snapshot where snapshot.id = '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea)
    and (NULL :: snapshot_target[] is null or target_type = any(NULL :: snapshot_target[]))
)
select snapshot_id, name, target, target_type
from filtered_snapshot_branches
where name >= '\x'::bytea
  and (NULL is null or convert_from(name, 'utf-8') ilike NULL)
  and (NULL is null or convert_from(name, 'utf-8') not ilike NULL)
order by name limit 2;
                                                                                                    QUERY PLAN                                                                                                     
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1004.01..6764.11 rows=2 width=76) (actual time=172523.081..173555.673 rows=2 loops=1)
   InitPlan 1 (returns $0)
     ->  Index Scan using snapshot_id_idx on snapshot  (cost=0.57..2.59 rows=1 width=8) (actual time=0.028..0.036 rows=1 loops=1)
           Index Cond: ((id)::bytea = '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea)
   ->  Gather Merge  (cost=1001.43..168852423.27 rows=58628 width=76) (actual time=172523.079..173555.661 rows=2 loops=1)
         Workers Planned: 2
         Params Evaluated: $0
         Workers Launched: 2
         ->  Nested Loop  (cost=1.40..168844656.12 rows=24428 width=76) (actual time=126442.320..167761.276 rows=2 loops=3)
               ->  Parallel Index Scan using snapshot_branch_name_target_target_type_idx on snapshot_branch  (cost=0.70..12612971.47 rows=154824599 width=52) (actual time=0.077..80926.811 rows=23123612 loops=3)
                     Index Cond: (name >= '\x'::bytea)
               ->  Index Only Scan using snapshot_branches_pkey on snapshot_branches  (cost=0.70..1.01 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=69370837)
                     Index Cond: ((snapshot_id = $0) AND (branch_id = snapshot_branch.object_id))
                     Heap Fetches: 5
 Planning Time: 0.993 ms
 Execution Time: 173555.864 ms
(16 rows)

Time: 173667.434 ms (02:53.667)

limit 10:
https://explain.depesz.com/s/NbSU

softwareheritage=> explain analyze with filtered_snapshot_branches as (
  select '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea as snapshot_id, name, target, target_type
  from snapshot_branches
  inner join snapshot_branch on snapshot_branches.branch_id = snapshot_branch.object_id
  where snapshot_id = (select object_id from snapshot where snapshot.id = '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea)
    and (NULL :: snapshot_target[] is null or target_type = any(NULL :: snapshot_target[]))
)
select snapshot_id, name, target, target_type
from filtered_snapshot_branches
where name >= '\x'::bytea
  and (NULL is null or convert_from(name, 'utf-8') ilike NULL)
  and (NULL is null or convert_from(name, 'utf-8') not ilike NULL)
order by name limit 10;
Time: 895976.065 ms (14:55.976)
                                                                                                   QUERY PLAN                                                                                                     
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1004.01..29804.51 rows=10 width=76) (actual time=148561.091..895901.340 rows=10 loops=1)
   InitPlan 1 (returns $0)
     ->  Index Scan using snapshot_id_idx on snapshot  (cost=0.57..2.59 rows=1 width=8) (actual time=0.032..0.038 rows=1 loops=1)
           Index Cond: ((id)::bytea = '\xdfea9cb3249b932235b1cd60ed49c5e316a03147'::bytea)
   ->  Gather Merge  (cost=1001.43..168852537.02 rows=58628 width=76) (actual time=148561.089..895901.327 rows=10 loops=1)
         Workers Planned: 2
         Params Evaluated: $0
         Workers Launched: 2
         ->  Nested Loop  (cost=1.40..168844769.87 rows=24428 width=76) (actual time=112742.855..633535.913 rows=5 loops=3)
               ->  Parallel Index Scan using snapshot_branch_name_target_target_type_idx on snapshot_branch  (cost=0.70..12612971.47 rows=154824599 width=52) (actual time=0.042..360975.705 rows=78257338 loops=3)
                     Index Cond: (name >= '\x'::bytea)
               ->  Index Only Scan using snapshot_branches_pkey on snapshot_branches  (cost=0.70..1.01 rows=1 width=8) (actual time=0.003..0.003 rows=0 loops=234772013)
                     Index Cond: ((snapshot_id = $0) AND (branch_id = snapshot_branch.object_id))
                     Heap Fetches: 15
 Planning Time: 7.092 ms
 Execution Time: 895901.535 ms
(16 rows)

The query has cleary not the same impact with a limit of 10 or 2

What I do not understand is why the timeout does not occur in production webapp as the query with limit 2 is executed while browsing an origin to resolve the HEAD branch alias.

To be noted, that query is wrapped in a stored PGSQL procedure (swh_snapshot_get_by_id) so this may be related.

yep it's weird, but after looking at the code of the function, I realized it seems to be a known problem :
https://forge.softwareheritage.org/source/swh-storage/browse/master/swh/storage/sql/40-funcs.sql$665-667

so yes indeed, perhaps increasing the minimum limit as you suggested can do the job

Ack. I guess I will handle that directly in the swh.storage.postgresql.storage.snapshot_get_branches and ensures a minimum of 10 for the limit value.

anlambert mentioned this in D5191: postgresql: Ensure a minimum limit for the snapshot branches query.Mar 3 2021, 4:27 PM

I have created D5191 to ensure a minimum limit value of 10.

anlambert mentioned this in rDSTO88ff2c2fa0ec: postgresql: Ensure a minimum limit for the snapshot branches query.Mar 3 2021, 6:37 PM

Snapshot branches queries performancesActivePublicActions

Event Timeline

Snapshot branches queries performances
ActivePublic
Actions