Page MenuHomeSoftware Heritage

Many NotFound repositories on GitHub since 2022-06-15 or 2022-06-16
Closed, MigratedEdits Locked

Description

@douardda noticed that https://sentry.softwareheritage.org/share/issue/99d67860c3484c7ab709154962ca8eb6/ shows a considerable increase in the number of "NotFound" repositories on GitHub, since 2022-06-15 or 2022-06-16.

This may not be an issue, but I find it surprising.

I have looked at one such origin in particular: https://sentry.softwareheritage.org/organizations/swh/issues/10253/events/47b2b8f714364acea483c7e16f3a4ffb/

The loader started visiting on "2022-06-21 07:51:02,699" (according to breadcrumbs in Sentry).

The scheduler entry for this origin is:

softwareheritage-scheduler=> select * from listed_origins where url='https://github.com/Stanley-Ezeaku/kotlin';
-[ RECORD 1 ]----------+-----------------------------------------
lister_id              | 6632ef5e-322b-402b-8f28-d090f76ed6b7
url                    | https://github.com/Stanley-Ezeaku/kotlin
visit_type             | git
extra_loader_arguments | {}
enabled                | f
first_seen             | 2021-06-10 02:15:29.470435+00
last_seen              | 2022-06-21 07:51:03.813845+00
last_update            | 2020-02-27 09:11:58+00

and the associated lister:

softwareheritage-scheduler=> select * from listers where id='6632ef5e-322b-402b-8f28-d090f76ed6b7';
-[ RECORD 1 ]-+-------------------------------------
id            | 6632ef5e-322b-402b-8f28-d090f76ed6b7
name          | github
instance_name | github
created       | 2021-02-04 08:01:51.163997+00
current_state | {"last_seen_id": 490551028}
updated       | 2022-05-10 07:23:49.246279+00

This is surprising, because according to last_seen, the lister saw this origin 1.2s after we started loading it (or claimed to see it; this might be a lister bug).