Page MenuHomeSoftware Heritage

Deploy swh.lister v2.7
Closed, MigratedEdits Locked

Description

It allows to:

  • make the launchpad lister to list bzr origins
  • fix the sourceforge cvs and bzr origins listing

So this needs:

  • clean up the actual bzr and cvs origins from the scheduling db as the current ones are all 404.
  • migrate the launchpad lister's state to be adapted to the newest schema

Plan in order for environment in staging then production:

  • clean up data
  • migrate lister launchpad state (to avoid losing what we did, not that big of a deal otherwise but still)
  • upgrade python3-swh.lister to v2.7
  • restart swh-worker@lister service

Environment status:

  • staging
  • production

For staging first:

  • trigger back the sourceforge and launchpad listing

Event Timeline

ardumont triaged this task as Normal priority.Feb 17 2022, 2:01 PM
ardumont created this task.
ardumont changed the task status from Open to Work in Progress.Feb 17 2022, 2:12 PM
ardumont moved this task from Backlog to in-progress on the System administration board.

Clean up:

delete from listed_origins where visit_type in ('cvs', 'bzr');
select visit_type, count(*) from listed_origins where visit_type in ('cvs', 'bzr') group by visit_type;

UPDATE listers
SET current_state=jsonb_set(
    current_state - 'date_last_modified', '{git_date_last_modified}',
    (select current_state#>'{date_last_modified}' from listers where name='launchpad' and instance_name='launchpad')
)
where name='launchpad' and instance_name='launchpad';
select * from listers where name='launchpad' and instance_name='launchpad';

staging:

14:23:08 swh-scheduler@db1:5432=> delete from listed_origins where visit_type in ('cvs', 'bzr');
DELETE 28991
Time: 1691.741 ms (00:01.692)
14:23:10 swh-scheduler@db1:5432=> select visit_type, count(*) from listed_origins where visit_type in ('cvs', 'bzr') group by visit_type;
+------------+-------+
| visit_type | count |
+------------+-------+
+------------+-------+
(0 rows)

Time: 968.147 ms
14:23:11 swh-scheduler@db1:5432=>
14:23:11 swh-scheduler@db1:5432=> UPDATE listers
swh-scheduler-> SET current_state=jsonb_set(
swh-scheduler(>     current_state - 'date_last_modified', '{git_date_last_modified}',
swh-scheduler(>     (select current_state#>'{date_last_modified}' from listers where name='launchpad' and instance_name='launchpad')
swh-scheduler(> )
swh-scheduler-> where name='launchpad' and instance_name='launchpad';
ERROR:  null value in column "current_state" violates not-null constraint
DETAIL:  Failing row contains (4708f4b6-f56d-4c22-88f0-6132e99cd19e, launchpad, launchpad, 2021-01-28 14:58:12.789843+00, null, 2021-01-28 14:58:12.789843+00).
Time: 56.961 ms
14:23:12 swh-scheduler@db1:5432=> select * from listers where name='launchpad' and instance_name='launchpad';
+--------------------------------------+-----------+---------------+-------------------------------+---------------+-------------------------------+
|                  id                  |   name    | instance_name |            created            | current_state |            updated            |
+--------------------------------------+-----------+---------------+-------------------------------+---------------+-------------------------------+
| 4708f4b6-f56d-4c22-88f0-6132e99cd19e | launchpad | launchpad     | 2021-01-28 14:58:12.789843+00 | {}            | 2021-01-28 14:58:12.789843+00 |  <------------------- no incremental listing yet
+--------------------------------------+-----------+---------------+-------------------------------+---------------+-------------------------------+
(1 row)

prod:

14:22:42 softwareheritage-scheduler@belvedere:5432=> delete from listed_origins where visit_type in ('cvs', 'bzr');
DELETE 28912
Time: 681.115 ms
14:22:44 softwareheritage-scheduler@belvedere:5432=> select visit_type, count(*) from listed_origins where visit_type in ('cvs', 'bzr') group by visit_type;
+------------+-------+
| visit_type | count |
+------------+-------+
+------------+-------+
(0 rows)

Time: 33.466 ms
14:22:44 softwareheritage-scheduler@belvedere:5432=>
14:22:44 softwareheritage-scheduler@belvedere:5432=> UPDATE listers
softwareheritage-scheduler-> SET current_state=jsonb_set(
softwareheritage-scheduler(>     current_state - 'date_last_modified', '{git_date_last_modified}',
softwareheritage-scheduler(>     (select current_state#>'{date_last_modified}' from listers where name='launchpad' and instance_name='launchpad')
softwareheritage-scheduler(> )
softwareheritage-scheduler-> where name='launchpad' and instance_name='launchpad';
UPDATE 1
Time: 5.857 ms
14:22:44 softwareheritage-scheduler@belvedere:5432=> select * from listers where name='launchpad' and instance_name='launchpad';
+--------------------------------------+-----------+---------------+-------------------------------+----------------------------------------------------------------+----------------------------+
|                  id                  |   name    | instance_name |            created            |                         current_state                          |          updated           |
+--------------------------------------+-----------+---------------+-------------------------------+----------------------------------------------------------------+----------------------------+
| 9de8141b-e441-4ffd-b40d-d438b29c03fc | launchpad | launchpad     | 2021-03-17 20:51:45.949594+00 | {"git_date_last_modified": "2022-02-16T19:32:09.400561+00:00"} | 2022-02-16 19:33:11.247+00 |
+--------------------------------------+-----------+---------------+-------------------------------+----------------------------------------------------------------+----------------------------+
(1 row)

Time: 4.698 ms

staging lister happily lists new bzr and cvs origins now:

14:48:32 swh-scheduler@db1:5432=> select now(), visit_type, count(*) from listed_origins where visit_type in ('git', 'cvs', 'bzr') group by visit_type;
+-------------------------------+------------+---------+
|              now              | visit_type |  count  |
+-------------------------------+------------+---------+
| 2022-02-17 13:48:33.905759+00 | bzr        |   10000 |
| 2022-02-17 13:48:33.905759+00 | cvs        |       3 |
| 2022-02-17 13:48:33.905759+00 | git        | 5514251 |
+-------------------------------+------------+---------+
(3 rows)

Time: 1305.779 ms (00:01.306)
14:48:35 swh-scheduler@db1:5432=> select now(), visit_type, url from listed_origins where visit_type = 'cvs' limit 10;
+-------------------------------+------------+---------------------------------------------------------------+
|              now              | visit_type |                              url                              |
+-------------------------------+------------+---------------------------------------------------------------+
| 2022-02-17 13:48:44.313886+00 | cvs        | rsync://a.cvs.sourceforge.net/cvsroot/familysite/familysite   |
| 2022-02-17 13:48:44.313886+00 | cvs        | rsync://a.cvs.sourceforge.net/cvsroot/familysite/web          |
| 2022-02-17 13:48:44.313886+00 | cvs        | rsync://a.cvs.sourceforge.net/cvsroot/calvingames-se/scorched |
+-------------------------------+------------+---------------------------------------------------------------+
(3 rows)

Time: 1172.989 ms (00:01.173)

14:46:00 swh-scheduler@db1:5432=> select now(), visit_type, url from listed_origins where visit_type = 'bzr' limit 10;
+-------------------------------+------------+------------------------------------------------------------------------------+
|              now              | visit_type |                                     url                                      |
+-------------------------------+------------+------------------------------------------------------------------------------+
| 2022-02-17 13:46:05.930609+00 | bzr        | https://code.launchpad.net/~subol-hackers/subol/safelisp-rpython             |
| 2022-02-17 13:46:05.930609+00 | bzr        | https://code.launchpad.net/~spiv/bzr/split-smart                             |
| 2022-02-17 13:46:05.930609+00 | bzr        | https://code.launchpad.net/~jamesh/pytz/tzfile                               |
| 2022-02-17 13:46:05.930609+00 | bzr        | https://code.launchpad.net/~jameinel/bzr/bzrdir-import-workingtree           |
| 2022-02-17 13:46:05.930609+00 | bzr        | https://code.launchpad.net/~washort/torc/trunk                               |
| 2022-02-17 13:46:05.930609+00 | bzr        | https://code.launchpad.net/~jdong/prevu/jdong-dev                            |
| 2022-02-17 13:46:05.930609+00 | bzr        | https://code.launchpad.net/~jamesmr-deactivatedaccount/nirvana/nirvana-tools |
| 2022-02-17 13:46:05.930609+00 | bzr        | https://code.launchpad.net/~easyvid-dev/easyvid/trunk                        |
| 2022-02-17 13:46:05.930609+00 | bzr        | https://code.launchpad.net/~ugr-dev/ugr-deactivated/trunk                    |
| 2022-02-17 13:46:05.930609+00 | bzr        | https://code.launchpad.net/~jamesmr-deactivatedaccount/nirvana/trunk         |
+-------------------------------+------------+------------------------------------------------------------------------------+
(10 rows)

Time: 122.151 ms

It's steadily progressing so closing now.

17:08:11 swh-scheduler@db1:5432=> select now(), visit_type, count(*) from listed_origins where visit_type in ('cvs', 'bzr') group by visit_type;
+-------------------------------+------------+--------+
|              now              | visit_type | count  |
+-------------------------------+------------+--------+
| 2022-02-17 16:11:07.104375+00 | bzr        | 182017 |
| 2022-02-17 16:11:07.104375+00 | cvs        |  11817 |
+-------------------------------+------------+--------+
(2 rows)

Time: 778.451 ms
ardumont claimed this task.
ardumont moved this task from deployed/landed/monitoring to done on the System administration board.