Analysis of the read performance for swh_content_find_directory, swh_revision_find_occurrence, with the help of Dimitri.
Description
Event Timeline
swh_content_find_directory query :
explain (analyze, buffers, verbose) with recursive path as ( -- Recursively build a path from the requested content to a root -- directory. Each iteration returns a pair (dir_id, filename) where -- filename is relative to dir_id. Stops when no parent directory can -- be found. (select dir.id as dir_id, dir_entry_f.name as name, 0 as depth from directory_entry_file as dir_entry_f join content on content.sha1_git = dir_entry_f.target join directory as dir on dir.file_entries @> array[dir_entry_f.id] where content.sha1 = '\xe516d03ee598741abfd2ab156562665348629735' limit 1) union all (select dir.id as dir_id, (dir_entry_d.name || '/' || path.name)::unix_path as name, path.depth + 1 from path join directory_entry_dir as dir_entry_d on dir_entry_d.target = path.dir_id join directory as dir on dir.dir_entries @> array[dir_entry_d.id] limit 1) ) select dir_id, name from path order by depth desc limit 1;
enable_seqscan = on : http://explain.depesz.com/s/9Mv
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=91.31..91.31 rows=1 width=68) (actual time=931302.242..931302.245 rows=1 loops=1) Output: path.dir_id, path.name, path.depth Buffers: shared hit=5087403 read=15089353, temp read=2399387 written=2399389 CTE path -> Recursive Union (cost=5.69..91.03 rows=11 width=64) (actual time=275544.524..931302.143 rows=3 loops=1) Buffers: shared hit=5087400 read=15089353, temp read=2399387 written=2399389 -> Subquery Scan on "*SELECT* 1" (cost=5.69..16.45 rows=1 width=38) (actual time=275544.516..275544.518 rows=1 loops=1) Output: "*SELECT* 1".dir_id, "*SELECT* 1".name, 0 Buffers: shared hit=3302077 read=6916568 -> Limit (cost=5.69..16.44 rows=1 width=38) (actual time=275544.513..275544.514 rows=1 loops=1) Output: dir.id, dir_entry_f.name, (0) Buffers: shared hit=3302077 read=6916568 -> Nested Loop (cost=5.69..12295582.17 rows=1142992 width=38) (actual time=275544.510..275544.510 rows=1 loops=1) Output: dir.id, dir_entry_f.name, 0 Join Filter: (dir.file_entries @> ARRAY[dir_entry_f.id]) Rows Removed by Join Filter: 205523161 Buffers: shared hit=3302077 read=6916568 -> Nested Loop (cost=5.69..230.17 rows=1 width=25) (actual time=7.270..7.270 rows=1 loops=1) Output: dir_entry_f.name, dir_entry_f.id Buffers: shared hit=8 read=4 -> Index Scan using content_pkey on public.content (cost=0.57..8.59 rows=1 width=21) (actual time=0.043..0.043 rows=1 loops=1) Output: content.sha1, content.sha1_git, content.sha256, content.length, content.ctime, content.status Index Cond: ((content.sha1)::bytea = '\xe516d03ee598741abfd2ab156562665348629735'::bytea) Buffers: shared hit=6 -> Bitmap Heap Scan on public.directory_entry_file dir_entry_f (cost=5.11..221.04 rows=54 width=46) (actual time=7.216..7.216 rows=1 loops=1) Output: dir_entry_f.name, dir_entry_f.target, dir_entry_f.id Recheck Cond: ((dir_entry_f.target)::bytea = (content.sha1_git)::bytea) Heap Blocks: exact=1 Buffers: shared hit=2 read=4 -> Bitmap Index Scan on directory_entry_file_target_name_perms_idx (cost=0.00..5.10 rows=54 width=0) (actual time=7.153..7.153 rows=1 loops=1) Index Cond: ((dir_entry_f.target)::bytea = (content.sha1_git)::bytea) Buffers: shared hit=2 read=3 -> Seq Scan on public.directory dir (cost=0.00..9437872.00 rows=228598400 width=163) (actual time=0.031..79390.386 rows=205523162 loops=1) Output: dir.id, dir.dir_entries, dir.file_entries, dir.rev_entries Buffers: shared read=6435573 -> Limit (cost=0.33..7.44 rows=1 width=64) (actual time=218585.812..218585.812 rows=1 loops=3) Output: dir_1.id, (((((dir_entry_d.name)::bytea || '\x2f'::bytea) || (path_1.name)::bytea))::unix_path), ((path_1.depth + 1)) Buffers: shared hit=1785323 read=8172785, temp read=2399387 written=2399389 -> Nested Loop (cost=0.33..170706177.85 rows=24002832 width=64) (actual time=218585.805..218585.805 rows=1 loops=3) Output: dir_1.id, ((((dir_entry_d.name)::bytea || '\x2f'::bytea) || (path_1.name)::bytea))::unix_path, (path_1.depth + 1) Join Filter: (dir_1.dir_entries @> ARRAY[dir_entry_d.id]) Rows Removed by Join Filter: 137019104 Buffers: shared hit=1785323 read=8172785, temp read=2399387 written=2399389 -> Hash Join (cost=0.33..4123320.61 rows=21 width=51) (actual time=37369.078..37369.078 rows=1 loops=3) Output: path_1.name, path_1.depth, dir_entry_d.name, dir_entry_d.id Hash Cond: ((dir_entry_d.target)::bytea = (path_1.dir_id)::bytea) Buffers: shared hit=46 read=1613775 -> Seq Scan on public.directory_entry_dir dir_entry_d (cost=0.00..3438898.69 rows=182512369 width=36) (actual time=0.025..23930.048 rows=60452382 loops=3) Output: dir_entry_d.name, dir_entry_d.target, dir_entry_d.id Buffers: shared hit=46 read=1613775 -> Hash (cost=0.20..0.20 rows=10 width=68) (actual time=0.007..0.007 rows=1 loops=3) Output: path_1.name, path_1.depth, path_1.dir_id Buckets: 1024 Batches: 1 Memory Usage: 1kB -> WorkTable Scan on path path_1 (cost=0.00..0.20 rows=10 width=68) (actual time=0.003..0.003 rows=1 loops=3) Output: path_1.name, path_1.depth, path_1.dir_id -> Materialize (cost=0.00..14599196.00 rows=228598400 width=114) (actual time=0.033..145769.217 rows=205528657 loops=2) Output: dir_1.id, dir_1.dir_entries Buffers: shared hit=1065 read=6434606, temp read=2399387 written=2399389 -> Seq Scan on public.directory dir_1 (cost=0.00..9437872.00 rows=228598400 width=114) (actual time=0.021..102911.372 rows=205528806 loops=1) Output: dir_1.id, dir_1.dir_entries Buffers: shared hit=1065 read=6434606 -> Sort (cost=0.28..0.30 rows=11 width=68) (actual time=931302.239..931302.239 rows=1 loops=1) Output: path.dir_id, path.name, path.depth Sort Key: path.depth Sort Method: top-N heapsort Memory: 25kB Buffers: shared hit=5087403 read=15089353, temp read=2399387 written=2399389 -> CTE Scan on path (cost=0.00..0.22 rows=11 width=68) (actual time=275544.531..931302.162 rows=3 loops=1) Output: path.dir_id, path.name, path.depth Buffers: shared hit=5087400 read=15089353, temp read=2399387 written=2399389 Planning time: 1.489 ms Execution time: 937179.160 ms (71 rows)
enable_seqscan = off : http://explain.depesz.com/s/RhW
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=97493.07..97493.07 rows=1 width=68) (actual time=20.643..20.643 rows=1 loops=1) Output: path.dir_id, path.name, path.depth Buffers: shared hit=23 read=21 CTE path -> Recursive Union (cost=8863.94..97492.79 rows=11 width=64) (actual time=6.564..20.597 rows=3 loops=1) Buffers: shared hit=23 read=21 -> Subquery Scan on "*SELECT* 1" (cost=8863.94..8863.97 rows=1 width=38) (actual time=6.559..6.561 rows=1 loops=1) Output: "*SELECT* 1".dir_id, "*SELECT* 1".name, 0 Buffers: shared hit=14 read=4 -> Limit (cost=8863.94..8863.96 rows=1 width=38) (actual time=6.557..6.558 rows=1 loops=1) Output: dir.id, dir_entry_f.name, (0) Buffers: shared hit=14 read=4 -> Nested Loop (cost=8863.94..34809.74 rows=1142992 width=38) (actual time=6.556..6.556 rows=1 loops=1) Output: dir.id, dir_entry_f.name, 0 Buffers: shared hit=14 read=4 -> Nested Loop (cost=5.69..230.17 rows=1 width=25) (actual time=0.105..0.105 rows=1 loops=1) Output: dir_entry_f.name, dir_entry_f.id Buffers: shared hit=12 -> Index Scan using content_pkey on public.content (cost=0.57..8.59 rows=1 width=21) (actual time=0.049..0.049 rows=1 loops=1) Output: content.sha1, content.sha1_git, content.sha256, content.length, content.ctime, content.status Index Cond: ((content.sha1)::bytea = '\xe516d03ee598741abfd2ab156562665348629735'::bytea) Buffers: shared hit=6 -> Bitmap Heap Scan on public.directory_entry_file dir_entry_f (cost=5.11..221.04 rows=54 width=46) (actual time=0.044..0.044 rows=1 loops=1) Output: dir_entry_f.name, dir_entry_f.target, dir_entry_f.id Recheck Cond: ((dir_entry_f.target)::bytea = (content.sha1_git)::bytea) Heap Blocks: exact=1 Buffers: shared hit=6 -> Bitmap Index Scan on directory_entry_file_target_name_perms_idx (cost=0.00..5.10 rows=54 width=0) (actual time=0.030..0.030 rows=1 loops=1) Index Cond: ((dir_entry_f.target)::bytea = (content.sha1_git)::bytea) Buffers: shared hit=5 -> Bitmap Heap Scan on public.directory dir (cost=8858.25..23149.65 rows=1142992 width=163) (actual time=6.445..6.445 rows=1 loops=1) Output: dir.id, dir.dir_entries, dir.file_entries, dir.rev_entries Recheck Cond: (dir.file_entries @> ARRAY[dir_entry_f.id]) Heap Blocks: exact=1 Buffers: shared hit=2 read=4 -> Bitmap Index Scan on directory_file_entries_idx (cost=0.00..8572.50 rows=1142992 width=0) (actual time=6.416..6.416 rows=11 loops=1) Index Cond: (dir.file_entries @> ARRAY[dir_entry_f.id]) Buffers: shared hit=1 read=4 -> Limit (cost=8862.82..8862.86 rows=1 width=64) (actual time=4.670..4.671 rows=1 loops=3) Output: dir_1.id, (((((dir_entry_d.name)::bytea || '\x2f'::bytea) || (path_1.name)::bytea))::unix_path), ((path_1.depth + 1)) Buffers: shared hit=9 read=17 -> Nested Loop (cost=8862.82..906318.40 rows=24002832 width=64) (actual time=4.668..4.668 rows=1 loops=3) Output: dir_1.id, ((((dir_entry_d.name)::bytea || '\x2f'::bytea) || (path_1.name)::bytea))::unix_path, (path_1.depth + 1) Buffers: shared hit=9 read=17 -> Nested Loop (cost=4.59..126.44 rows=21 width=51) (actual time=4.018..4.018 rows=1 loops=3) Output: path_1.name, path_1.depth, dir_entry_d.name, dir_entry_d.id Buffers: shared hit=2 read=12 -> WorkTable Scan on path path_1 (cost=0.00..0.20 rows=10 width=68) (actual time=0.002..0.002 rows=1 loops=3) Output: path_1.dir_id, path_1.name, path_1.depth -> Bitmap Heap Scan on public.directory_entry_dir dir_entry_d (cost=4.59..12.60 rows=2 width=36) (actual time=4.007..4.007 rows=1 loops=3) Output: dir_entry_d.name, dir_entry_d.target, dir_entry_d.id Recheck Cond: ((dir_entry_d.target)::bytea = (path_1.dir_id)::bytea) Heap Blocks: exact=2 Buffers: shared hit=2 read=12 -> Bitmap Index Scan on directory_entry_dir_target_name_perms_idx (cost=0.00..4.58 rows=2 width=0) (actual time=3.978..3.978 rows=1 loops=3) Index Cond: ((dir_entry_d.target)::bytea = (path_1.dir_id)::bytea) Buffers: shared hit=2 read=10 -> Bitmap Heap Scan on public.directory dir_1 (cost=8858.24..23149.64 rows=1142992 width=114) (actual time=0.952..0.952 rows=1 loops=2) Output: dir_1.id, dir_1.dir_entries, dir_1.file_entries, dir_1.rev_entries Recheck Cond: (dir_1.dir_entries @> ARRAY[dir_entry_d.id]) Heap Blocks: exact=2 Buffers: shared hit=7 read=5 -> Bitmap Index Scan on directory_dir_entries_idx (cost=0.00..8572.49 rows=1142992 width=0) (actual time=0.931..0.931 rows=2 loops=2) Index Cond: (dir_1.dir_entries @> ARRAY[dir_entry_d.id]) Buffers: shared hit=5 read=5 -> Sort (cost=0.28..0.30 rows=11 width=68) (actual time=20.641..20.641 rows=1 loops=1) Output: path.dir_id, path.name, path.depth Sort Key: path.depth Sort Method: top-N heapsort Memory: 25kB Buffers: shared hit=23 read=21 -> CTE Scan on path (cost=0.00..0.22 rows=11 width=68) (actual time=6.570..20.611 rows=3 loops=1) Output: path.dir_id, path.name, path.depth Buffers: shared hit=23 read=21 Planning time: 1.463 ms Execution time: 20.994 ms (75 rows)
swh_revision_find_occurrence
shortcut query:
explain (analyze,buffers,verbose) select origin, branch, revision from occurrence_history as occ_hist where occ_hist.revision = '\x1140811ebed3b33b496a3aed32ed16b068ebffab' order by upper(occ_hist.validity) -- TODO filter by authority? limit 1;
output : http://explain.depesz.com/s/45Bn
QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=28.56..28.57 rows=1 width=74) (actual time=769.001..769.001 rows=0 loops=1) Output: origin, branch, revision, (upper(validity)) Buffers: shared hit=67879 -> Sort (cost=28.56..28.58 rows=6 width=74) (actual time=768.998..768.998 rows=0 loops=1) Output: origin, branch, revision, (upper(validity)) Sort Key: (upper(occ_hist.validity)) Sort Method: quicksort Memory: 25kB Buffers: shared hit=67879 -> Index Scan using occurrence_history_origin_branch_revision_authority_validi_excl on public.occurrence_history occ_hist (cost=0.41..28.53 rows=6 width=74) (actual time=768.977..768.977 rows=0 loops=1) Output: origin, branch, revision, upper(validity) Index Cond: ((occ_hist.revision)::bytea = '\x1140811ebed3b33b496a3aed32ed16b068ebffab'::bytea) Buffers: shared hit=67879 Planning time: 0.317 ms Execution time: 769.112 ms`
naive query :
explain (analyze, buffers, verbose) select origin, branch, revision from swh_revision_list_children('\x1140811ebed3b33b496a3aed32ed16b068ebffab') as rev_list(sha1_git) left join occurrence_history occ_hist on rev_list.sha1_git = occ_hist.revision where occ_hist.origin is not null order by upper(occ_hist.validity) -- TODO filter by authority? limit 1;
output: http://explain.depesz.com/s/NKy
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=7449.19..7449.19 rows=1 width=74) (actual time=198.216..198.217 rows=1 loops=1) Output: occ_hist.origin, occ_hist.branch, occ_hist.revision, (upper(occ_hist.validity)) Buffers: shared hit=54065 -> Sort (cost=7449.19..7452.37 rows=1273 width=74) (actual time=198.214..198.214 rows=1 loops=1) Output: occ_hist.origin, occ_hist.branch, occ_hist.revision, (upper(occ_hist.validity)) Sort Key: (upper(occ_hist.validity)) Sort Method: top-N heapsort Memory: 25kB Buffers: shared hit=54065 -> Nested Loop (cost=1276.21..7442.82 rows=1273 width=74) (actual time=0.780..198.119 rows=59 loops=1) Output: occ_hist.origin, occ_hist.branch, occ_hist.revision, upper(occ_hist.validity) Buffers: shared hit=54065 -> CTE Scan on rev_list (cost=1275.78..1279.60 rows=191 width=32) (actual time=0.081..134.879 rows=6713 loops=1) Output: rev_list.id Buffers: shared hit=33825 CTE rev_list -> Recursive Union (cost=0.56..1275.78 rows=191 width=21) (actual time=0.075..128.774 rows=6713 loops=1) Buffers: shared hit=33825 -> Index Only Scan using revision_pkey on public.revision (cost=0.56..8.58 rows=1 width=21) (actual time=0.065..0.066 rows=1 loops=1) Output: revision.id Index Cond: (revision.id = ('\x1140811ebed3b33b496a3aed32ed16b068ebffab'::bytea)::bytea) Heap Fetches: 1 Buffers: shared hit=5 -> Nested Loop (cost=4.58..126.34 rows=19 width=21) (actual time=0.017..0.038 rows=2 loops=3073) Output: h.id Buffers: shared hit=33820 -> WorkTable Scan on rev_list rev_list_1 (cost=0.00..0.20 rows=10 width=32) (actual time=0.000..0.001 rows=2 loops=3073) Output: rev_list_1.id -> Bitmap Heap Scan on public.revision_history h (cost=4.58..12.59 rows=2 width=42) (actual time=0.015..0.015 rows=1 loops=6713) Output: h.id, h.parent_id, h.parent_rank Recheck Cond: ((h.parent_id)::bytea = (rev_list_1.id)::bytea) Heap Blocks: exact=6889 Buffers: shared hit=33820 -> Bitmap Index Scan on revision_history_parent_id_idx (cost=0.00..4.58 rows=2 width=0) (actual time=0.012..0.012 rows=1 loops=6713) Index Cond: ((h.parent_id)::bytea = (rev_list_1.id)::bytea) Buffers: shared hit=26931 -> Index Scan using occurrence_history_revision_idx on public.occurrence_history occ_hist (cost=0.43..32.17 rows=7 width=74) (actual time=0.009..0.009 rows=0 loops=6713) Output: occ_hist.origin, occ_hist.branch, occ_hist.revision, occ_hist.authority, occ_hist.validity Index Cond: ((occ_hist.revision)::bytea = (rev_list.id)::bytea) Filter: (occ_hist.origin IS NOT NULL) Buffers: shared hit=20240 Planning time: 1.061 ms Execution time: 198.672 ms (42 rows)
I'm currently toying around with another approach for those recursive queries:
- Creation of an object_id sequence
- Addition of an object_id to all kinds of objects (content, skipped_content, directory, revision, release, occurrence_history), that dips from this object_id sequence
- Creation of a hierarchy_cache table with two columns: parent, child.
The queries to fill in the hierarchy_cache table are pretty simple, and are append-only. For instance for the directory hierarchy:
insert into hierarchy_cache ( with unnested as ( select directory.object_id, unnest(directory.dir_entries) as dir from directory ) select unnested.object_id as parent, directory.object_id as child from unnested left join directory_entry_dir e on unnested.dir = e.id left join directory on directory.id = e.target );
The queries that walk through the history cache recursively are trivial
The object_id column looks like something that we want.
The hierarchy_cache relation takes a (very) long time to populate.
We're now investigating a direct revision -> content shortcut relationship.