Page MenuHomeSoftware Heritage

Deploy remaining next-gen listers on staging
Closed, ResolvedPublic

Description

Event Timeline

ardumont triaged this task as Normal priority.Feb 1 2021, 3:52 PM
ardumont created this task.

Status: OK

Schedule gnu on staging:

swhworker@worker0:~$ swh scheduler --url http://scheduler0.internal.staging.swh.network:5008/ task add list-gnu-full
Created 1 tasks

Task 17271527
  Next run: today (2021-02-01T14:51:42.852948+00:00)
  Interval: 90 days, 0:00:00
  Type: list-gnu-full
  Policy: recurring
  Args:
  Keyword args:

swhworker@worker0:~$ logout

Check everything runs smoothly:

Feb 01 14:51:54 worker0 python3[161717]: [2021-02-01 14:51:54,021: INFO/ForkPoolWorker-4] Task swh.lister.gnu.tasks.GNUListerTask[7a5a3564-bd65-4dec-8107-e73cdb9cfd47] succeeded in 2.6035053330124356s: {'pages': 1, 'origins': 384}

Listedorigins:

swh-scheduler=> select count(*) from listed_origins lo inner join listers l on lo.lister_id=l.id and l.name='GNU' and l.instance_name='GNU';
 count
-------
   384
(1 row)

Run ok:

swhworker@worker0:~$ SWH_CONFIG_FILENAME=lister.yml swh lister run --lister cgit url=https://git.eclipse.org/c/ instance=eclipse base_git_url=https://git.eclipse.org/r/

Which stored listed_origins alright:

swh-scheduler=> select count(*) from listed_origins lo inner join listers l on lo.lister_id=l.id and l.name='cgit' and l.instance_name='eclipse';
 count
-------
  2240
(1 row)

Explained because the new listing did add the 1340 new computed origins in one go.
There were already 900 done before.

Note that with the new cgit implementations, this adds a trailing / at the end of the origin urls.
It's still resolvable at least with our tryouts on either the cgit instances git.kernel.org and git.eclipse.org.

But that means current listed data in staging must be reworked prior to listing (or we'll store 2 different
origins which are the "same").

With D4987 on the verge of being packaged, readapted the current listed_origins (staging) with:

swh-scheduler=> begin;
BEGIN
swh-scheduler=> update listed_origins set url=regexp_replace(url, '/$', '') where lister_id='1af42c1b-69b0-41ef-ad98-371843de406e' ;
UPDATE 1340
swh-scheduler=> select * from listed_origins where lister_id='1af42c1b-69b0-41ef-ad98-371843de406e' ;
swh-scheduler=> commit;
COMMIT

(so they match)

swh.lister v0.8.0 packaged with the latest packagist lister port.

Run triggered on staging:

swhworker@worker0:~$ SWH_CONFIG_FILENAME=/etc/softwareheritage/lister.yml swh lister run --lister packagist                                                                                                                                    WARNING:swh.lister.packagist.lister:Unexpected HTTP status code 404 on https://repo.packagist.org/p/2amigos/selectize.js.json: b'{"type":"https:\\/\\/tools.ietf.org\\/html\\/rfc2616#section-10","title":"An error occurred","status":404,"det
ail":"Not Found"}'
...

so far so good:

swh-scheduler=> select now(), l.name, l.instance_name, count(*) from listed_origins lo inner join listers l on lo.lister_id=l.id group by (l.name, l.instance_name);
              now              |    name     | instance_name |  count
-------------------------------+-------------+---------------+---------
...
 2021-02-03 12:23:19.926102+00 | Packagist   | packagist     |  286705
...
(14 rows)
ardumont changed the task status from Open to Work in Progress.Feb 3 2021, 1:24 PM
ardumont updated the task description. (Show Details)
ardumont moved this task from Backlog to in-progress on the System administration board.
ardumont moved this task from in-progress to deployed/landed on the System administration board.
ardumont claimed this task.
ardumont moved this task from deployed/landed to done on the System administration board.