Page MenuHomeSoftware Heritage

swh-loader-tar generates dangling releases
Closed, MigratedEdits Locked

Description

The releases generated by swh-loader-tar aren't referenced by any occurrences: we generate occurrences pointing at revision objects, but not at releases.

Actions:

  • stop generating dangling releases and keep the occurrences pointing at revisions (source code adaptation)
  • Remove dangling gnu releases (generated by the tarball loader when ingesting gnu tarballs)

Event Timeline

zack moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Feb 16 2017, 9:14 AM

After discussion, we will stop generating dangling releases and keep the occurrences pointing at revisions.

Yep. Of course we will also need to remove all dangling releases that have been generated thus far.

Of course we will also need to remove all dangling releases that have been generated thus far.

Indeed

ardumont updated the task description. (Show Details)
ardumont updated the task description. (Show Details)

Actions undertook to clean dangling releases:

  1. Identifying the dangling releases' author ('swh-robot')
$ select id from person where name='Software Heritage' and fullname='Software Heritage <robot@softwareheritage.org>' and email='robot@softwareheritage.org';
> 3661419
  1. All origins coming from gnu are of type 'ftp'
$ select * from origin where type='ftp';
   id    | type |                                   url                                   | lister | project
---------+------+-------------------------------------------------------------------------+--------+---------
 4423668 | ftp  | rsync://ftp.gnu.org/gnu/3dldf                                           |        |
 4423671 | ftp  | rsync://ftp.gnu.org/gnu/3dldf                                           |        |
 4423974 | ftp  | rsync://ftp.gnu.org/gnu/GNUinfo/Audio/francais                          |        |
...

Checking manually those are indeed only gnu (there aren't that much).

All dangling releases that were created targets the same revision as the occurrence that were created at the same time:

$ select count(r.id) from occurrence occ inner join origin ori on occ.origin=ori.id inner join release r on (r.target=occ.target and r.target_type='revision') where ori.type='ftp' and r.author=3661419;
 count
-------
  8911
  1. Thus the deletion step
$ delete from release where id in (select r.id from occurrence occ inner join origin ori on occ.origin=ori.id inner join release r on (r.target=occ.target and r.target_type='revision') where ori.type='ftp' and r.author=3661419);
  1. Checking for releases with the 'swh-robot' author no longer shows any releases
softwareheritage=> select count(*) from release r where r.author=3661419 and r.synthetic;
 count
-------
     0
(1 row)