- T3013#57801: OK: Gnu
- T3013#57809: OK: CGit with latest changes
- T3013#58007: OK: Packagist
Description
Revisions and Commits
rDLS Listers | |||
D4987 | rDLS8e4dd178f1df cgit: remove the repository urls's trailing / |
Related Objects
- Mentioned In
- T376: ingest git.eclipse.org repositories
- Mentioned Here
- D4987: cgit: remove the repository urls's trailing /
Event Timeline
Status: OK
Schedule gnu on staging:
swhworker@worker0:~$ swh scheduler --url http://scheduler0.internal.staging.swh.network:5008/ task add list-gnu-full Created 1 tasks Task 17271527 Next run: today (2021-02-01T14:51:42.852948+00:00) Interval: 90 days, 0:00:00 Type: list-gnu-full Policy: recurring Args: Keyword args: swhworker@worker0:~$ logout
Check everything runs smoothly:
Feb 01 14:51:54 worker0 python3[161717]: [2021-02-01 14:51:54,021: INFO/ForkPoolWorker-4] Task swh.lister.gnu.tasks.GNUListerTask[7a5a3564-bd65-4dec-8107-e73cdb9cfd47] succeeded in 2.6035053330124356s: {'pages': 1, 'origins': 384}
Listedorigins:
swh-scheduler=> select count(*) from listed_origins lo inner join listers l on lo.lister_id=l.id and l.name='GNU' and l.instance_name='GNU'; count ------- 384 (1 row)
Run ok:
swhworker@worker0:~$ SWH_CONFIG_FILENAME=lister.yml swh lister run --lister cgit url=https://git.eclipse.org/c/ instance=eclipse base_git_url=https://git.eclipse.org/r/
Which stored listed_origins alright:
swh-scheduler=> select count(*) from listed_origins lo inner join listers l on lo.lister_id=l.id and l.name='cgit' and l.instance_name='eclipse'; count ------- 2240 (1 row)
Explained because the new listing did add the 1340 new computed origins in one go.
There were already 900 done before.
Note that with the new cgit implementations, this adds a trailing / at the end of the origin urls.
It's still resolvable at least with our tryouts on either the cgit instances git.kernel.org and git.eclipse.org.
But that means current listed data in staging must be reworked prior to listing (or we'll store 2 different
origins which are the "same").
With D4987 on the verge of being packaged, readapted the current listed_origins (staging) with:
swh-scheduler=> begin; BEGIN swh-scheduler=> update listed_origins set url=regexp_replace(url, '/$', '') where lister_id='1af42c1b-69b0-41ef-ad98-371843de406e' ; UPDATE 1340 swh-scheduler=> select * from listed_origins where lister_id='1af42c1b-69b0-41ef-ad98-371843de406e' ; swh-scheduler=> commit; COMMIT
(so they match)
swh.lister v0.8.0 packaged with the latest packagist lister port.
Run triggered on staging:
swhworker@worker0:~$ SWH_CONFIG_FILENAME=/etc/softwareheritage/lister.yml swh lister run --lister packagist WARNING:swh.lister.packagist.lister:Unexpected HTTP status code 404 on https://repo.packagist.org/p/2amigos/selectize.js.json: b'{"type":"https:\\/\\/tools.ietf.org\\/html\\/rfc2616#section-10","title":"An error occurred","status":404,"det ail":"Not Found"}' ...
so far so good:
swh-scheduler=> select now(), l.name, l.instance_name, count(*) from listed_origins lo inner join listers l on lo.lister_id=l.id group by (l.name, l.instance_name); now | name | instance_name | count -------------------------------+-------------+---------------+--------- ... 2021-02-03 12:23:19.926102+00 | Packagist | packagist | 286705 ... (14 rows)