Page MenuHomeSoftware Heritage
Paste P585

clean up obsolete cran visits and origins
ActivePublic

Authored by ardumont on Jan 16 2020, 2:59 PM.
delete from origin_visit where type='cran';
delete from origin where url like 'https://cran.r-project.org/%';

Event Timeline

ardumont changed the title of this paste from clean up obsolete cran visits to clean up obsolete cran visits and origins.

explain seems happy about it:

softwareheritage=> explain delete from origin_visit where type='cran';
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Delete on origin_visit  (cost=0.57..26042.38 rows=31151 width=6)
   ->  Index Scan using origin_visit_type_status_date_idx on origin_visit  (cost=0.57..26042.38 rows=31151 width=6)
         Index Cond: (type = 'cran'::text)
(3 rows)

softwareheritage=> explain delete from origin where url like 'https://cran.r-project.org/%';
                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Delete on origin  (cost=898.81..10109.02 rows=9137 width=6)
   ->  Bitmap Heap Scan on origin  (cost=898.81..10109.02 rows=9137 width=6)
         Recheck Cond: (url ~~ 'https://cran.r-project.org/%'::text)
         ->  Bitmap Index Scan on origin_url_idx  (cost=0.00..896.53 rows=9137 width=0)
               Index Cond: (url ~~ 'https://cran.r-project.org/%'::text)
(5 rows)

And staging's swh db also agreed to it ;)

Dumps [1] stored on belvedere on file format:

postgres@belvedere:~$ ls -la cran-origin*
-rw-r--r-- 1 postgres postgres 1523973 Jan 16 14:10 cran-origin-visits-to-cleanup.txt
-rw-r--r-- 1 postgres postgres 1088937 Jan 16 14:12 cran-origins-to-cleanup.txt
postgres@belvedere:~$ pwd
/var/lib/postgresql
postgres@belvedere:~$ head -2 cran-origins-to-cleanup.txt
91395427        https://cran.r-project.org/src/contrib/A3_1.0.0.tar.gz
91395428        https://cran.r-project.org/src/contrib/aaSEA_1.1.0.tar.gz
postgres@belvedere:~$ head -2 cran-origin-visits-to-cleanup.txt
91395427        1       2020-01-09 12:45:45.57874+00    full    \N      \\x13d64679c3b84eba21163ee7c2f254f76e396347     cran
91395428        1       2020-01-09 12:45:51.964293+00   full    \N      \\x00a21cfa0ab2957502944f95027fc3e35868cd20     cran

An archive holding those 2 files has been done and uploaded in the forge [2]

[1]

> \copy (select o.* from origin o inner join origin_visit ov on o.id=ov.origin where ov.type='cran') to '/var/lib/postgresql/cran-origins-to-cleanup.txt';
> \copy (select * from origin_visit where type='cran') to '/var/lib/postgresql/cran-origins-to-cleanup.txt';

[2] F3735285