Page MenuHomeSoftware Heritage
Paste P1310

query plan origin_visit_get_latest
ActivePublic

Authored by vlorentz on Mar 11 2022, 11:53 AM.
softwareheritage=> explain select ov.* from origin_visit ov INNER JOIN origin o ON o.id = ov.origin WHERE o.url = 'svn://scm.gforge.inria.fr/svnroot/fpg-ipc/' ORDER BY ov.visit DESC LIMIT 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Limit (cost=633.57..633.57 rows=1 width=28)
-> Sort (cost=633.57..633.59 rows=7 width=28)
Sort Key: ov.visit DESC
-> Nested Loop (cost=0.58..633.53 rows=7 width=28)
-> Index Scan using origin_url_idx1 on origin o (cost=0.00..2.02 rows=1 width=8)
Index Cond: (url = 'svn://scm.gforge.inria.fr/svnroot/fpg-ipc/'::text)
-> Index Scan using origin_visit_pkey on origin_visit ov (cost=0.58..625.15 rows=637 width=28)
Index Cond: (origin = o.id)
For comparison, when removing the join:
softwareheritage=> explain select ov.* from origin_visit ov WHERE ov.origin=162351914 ORDER BY ov.visit DESC LIMIT 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Limit (cost=0.58..1.56 rows=1 width=28)
-> Index Scan Backward using origin_visit_pkey on origin_visit ov (cost=0.58..625.15 rows=637 width=28)
Index Cond: (origin = 162351914)

Event Timeline

vlorentz edited the content of this paste. (Show Details)

FWIW even when forcing the use of the origin.url unique index, there's a nested loop then a sort:

12:07 swhstorage@softwareheritage => begin;
BEGIN
Temps : 5,506 ms
12:08 swhstorage@softwareheritage *=> explain select * from origin_visit ov INNER JOIN origin o ON o.id = ov.origin WHERE o.url =   'svn://scm.gforge.inria.fr/svnroot/fpg-ipc/' ORDER BY ov.visit LIMIT 1;
                                                   QUERY PLAN                                                   
────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Limit  (cost=633.57..633.57 rows=1 width=89)
   ->  Sort  (cost=633.57..633.59 rows=7 width=89)
         Sort Key: ov.visit
         ->  Nested Loop  (cost=0.58..633.53 rows=7 width=89)
               ->  Index Scan using origin_url_idx1 on origin o  (cost=0.00..2.02 rows=1 width=53)
                     Index Cond: (url = 'svn://scm.gforge.inria.fr/svnroot/fpg-ipc/'::text)
               ->  Index Scan using origin_visit_pkey on origin_visit ov  (cost=0.58..625.15 rows=637 width=28)
                     Index Cond: (origin = o.id)
(8 lignes)

Temps : 5,096 ms
12:08 swhstorage@softwareheritage *=> drop index origin_url_idx1;
DROP INDEX
Temps : 5,534 ms
12:08 swhstorage@softwareheritage *=> explain select * from origin_visit ov INNER JOIN origin o ON o.id = ov.origin WHERE o.url =   'svn://scm.gforge.inria.fr/svnroot/fpg-ipc/' ORDER BY ov.visit LIMIT 1;
                                                   QUERY PLAN                                                   
────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Limit  (cost=634.26..634.27 rows=1 width=89)
   ->  Sort  (cost=634.26..634.28 rows=7 width=89)
         Sort Key: ov.visit
         ->  Nested Loop  (cost=1.27..634.23 rows=7 width=89)
               ->  Index Scan using origin_url_idx2 on origin o  (cost=0.70..2.71 rows=1 width=53)
                     Index Cond: (url = 'svn://scm.gforge.inria.fr/svnroot/fpg-ipc/'::text)
               ->  Index Scan using origin_visit_pkey on origin_visit ov  (cost=0.58..625.15 rows=637 width=28)
                     Index Cond: (origin = o.id)
(8 lignes)

Temps : 6,005 ms
12:08 swhstorage@softwareheritage *=> rollback;

To avoid the sort, it seems that one needs to force a materialize of the origin table results by using origin.id in the WHERE condition:

2:06 guest@softwareheritage => explain select * from origin_visit ov inner join origin on origin.id = ov.origin WHERE origin.id=(select id from origin where url='svn://scm.gforge.inria.fr/svnroot/fpg-ipc/') ORDER BY ov.visit desc LIMIT 1;
                                                    QUERY PLAN                                                     
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Limit  (cost=3.17..4.16 rows=1 width=89)
   InitPlan 1 (returns $0)
     ->  Index Scan using origin_url_idx1 on origin origin_1  (cost=0.00..2.02 rows=1 width=8)
           Index Cond: (url = 'svn://scm.gforge.inria.fr/svnroot/fpg-ipc/'::text)
   ->  Nested Loop  (cost=1.15..635.70 rows=637 width=89)
         ->  Index Scan Backward using origin_visit_pkey on origin_visit ov  (cost=0.58..625.15 rows=637 width=28)
               Index Cond: (origin = $0)
         ->  Materialize  (cost=0.57..2.59 rows=1 width=53)
               ->  Index Scan using origin_pkey on origin  (cost=0.57..2.59 rows=1 width=53)
                     Index Cond: (id = $0)
(10 lignes)

Temps : 6,145 ms

(even sorting by the pk (origin_visit.origin, origin_visit.visit), postgres still sorts the output of the nested loop)