Page MenuHomeSoftware Heritage

Deploy lister v1.5.0
Closed, MigratedEdits Locked

Description

Mainly this holds a major fix [1] to make the gitlab lister more resilient to process the
missing 800k origins or so [2]

[1] Changelog:

  • gitlab: Handle HTTP status code 500 when listing projects
  • pattern: Use URL network location as instance name when not provided

[2] T3127#67645

Related to T3442

Event Timeline

ardumont created this task.

Deployed v1.5.0 and run triggered and ongoing:

Jul 27 16:21:11 worker07 python3[3910]: [2021-07-27 16:21:11,221: INFO/MainProcess] Received task: swh.lister.gitlab.tasks.IncrementalGitLabLister[7932e3a4-7ce0-4f7b-8153-bb9d5a1e1bce]
Jul 27 16:21:42 worker07 python3[3920]: [2021-07-27 16:21:42,213: WARNING/ForkPoolWorker-4] Unexpected HTTP status code 500 on https://gitlab.com/api/v4/projects?id_after=2114599&membership=false&order_by=id&owned=false&page=1&pagination=keyset&per_page=20&repository_checksum_failed=false&simple=false&sort=asc&starred=false&statistics=false&wiki_checksum_failed=false&with_custom_attributes=false&with_issues_enabled=false&with_merge_requests_enabled=false: b'{"message":"500 Internal Server Error"}'
...
ardumont changed the task status from Open to Work in Progress.Jul 27 2021, 6:24 PM
ardumont moved this task from Backlog to in-progress on the System administration board.

Listing ongoing and number of origins increasing regularly. From 200k yesterday [1]
(prior to the deployment of the lister) to ~800k [2] today. Progress.

[1] Snapshot taken yesterday through the scheduler metrics updates I did:

09:21:00 softwareheritage-scheduler@belvedere:5432=> select last_update, origins_known from scheduler_metrics sm inner join listers l on sm.lister_id=l.id where l.name='gitlab' and l.instance_name='gitlab';
+------------------------------+---------------+
|         last_update          | origins_known |
+------------------------------+---------------+
| 2021-07-27 15:55:27.30701+00 |        200200 |
+------------------------------+---------------+
(1 row)

Time: 10.492 ms

[2] Current snapshot taken this morning

09:14:27 softwareheritage-scheduler@belvedere:5432=> select now(), visit_type, count(*) from listed_origins lo inner join listers l on l.id=lo.lister_id where l.name='gitlab' and l.instance_name='gitlab' group by visit_type order by count(*) desc;
+-------------------------------+------------+--------+
|              now              | visit_type | count  |
+-------------------------------+------------+--------+
| 2021-07-28 07:14:02.974307+00 | git        | 790160 |
+-------------------------------+------------+--------+
(1 row)

Time: 727.838 ms

09:14:46 softwareheritage-scheduler@belvedere:5432=> select now(), visit_type, count(*) from listed_origins lo inner join listers l on l.id=lo.lister_id where l.name='gitlab' and l.instance_name='gitlab' group by visit_type order by count(*) desc;
+-------------------------------+------------+--------+
|              now              | visit_type | count  |
+-------------------------------+------------+--------+
| 2021-07-28 07:16:08.109911+00 | git        | 791580 |
+-------------------------------+------------+--------+
(1 row)

Time: 599.655 ms

It's still ongoing:

18:01:36 softwareheritage-scheduler@belvedere:5432=> select now(), visit_type, count(*) from listed_origins lo inner join listers l on l.id=lo.lister_id where l.name='gitlab' and l.instance_name='gitlab' group by visit_type order by count(
*) desc;
+-------------------------------+------------+---------+
|              now              | visit_type |  count  |
+-------------------------------+------------+---------+
| 2021-07-28 16:27:11.594417+00 | git        | 1138860 |
+-------------------------------+------------+---------+
(1 row)

Time: 5622.079 ms (00:05.622)

Still ongoing.

17:31:19 softwareheritage-scheduler@belvedere:5432=> select now(), visit_type, count(*) from listed_origins lo inner join listers l on l.id=lo.lister_id where l.name='gitlab' and l.instance_name='gitlab' group by visit_type order by count(
*) desc;
+-------------------------------+------------+---------+
|              now              | visit_type |  count  |
+-------------------------------+------------+---------+
| 2021-07-29 15:46:55.979469+00 | git        | 1747880 |
+-------------------------------+------------+---------+
(1 row)

Time: 3298.658 ms (00:03.299)

I'll close this now as the deployment per say is done.