Page MenuHomeSoftware Heritage

Fix and/or deploy production listers paper cuts
Closed, ResolvedPublic

Description

Now that the new listers have run a bit, some of those listers demonstrate issues caught
by sentry:

  • cran issue due to some parser error in date [1] (already fixed, needs deployement v0.9.0)
  • task inputs needs some update [2] (debian)
  • Task needs be disabled (incremental gitea listing no longer exist, this needs to be replaced by full listing when needed) [3]
  • some debian repository (debian-security) needs code adaptation (v0.9.1)

Resolve those issues following the next-gen deployments.

[1] https://sentry.softwareheritage.org/share/issue/500f1adcceed4897a7c580b62e7de7af/

[2] https://sentry.softwareheritage.org/share/issue/d15b817b882d4a28a1fbde27b5b5ba49/

[3] https://sentry.softwareheritage.org/share/issue/8a9ea56b626840d288c26e427a7addcd/

Event Timeline

ardumont triaged this task as Normal priority.Feb 8 2021, 9:16 AM
ardumont created this task.

Debian task inputs needs to change with the new version so:

Update debian task:

softwareheritage-scheduler=> update task set arguments='{"args": [], "kwargs": {"suites": ["stretch", "buster", "bullseye"], "components": ["main", "contrib", "non-free"], "mirror_url": "http://deb.debian.org/debian/", "distribution": "Debian"}}', next_run=now() where id=65911218;
softwareheritage-scheduler=> update task set arguments='{"args": [], "kwargs": {"suites": ["stretch", "buster", "bullseye", "bullseye-security"], "components": ["main", "contrib", "non-free"], "mirror_url": "http://deb.debian.org/debian-security/", "distribution": "Debian-Security"}}', status='next_run_not_scheduled' where id=65911219;

Disabling incremental gitea lister tasks as they no longer exist:

softwareheritage-scheduler=> select * from task where type like 'list-gitea-incremental';
    id     |          type          |                                   arguments                                   |           next_run            | current_interval |         status         |  policy   | retries_left | priority
-----------+------------------------+-------------------------------------------------------------------------------+-------------------------------+------------------+------------------------+-----------+--------------+----------
 337348713 | list-gitea-incremental | {"args": [], "kwargs": {"url": "https://git.fsfe.org/api/v1/", "limit": 100}} | 2021-02-08 22:02:24.327663+00 | 1 day            | next_run_not_scheduled | recurring |            0 |
 337315168 | list-gitea-incremental | {"args": [], "kwargs": {"url": "https://codeberg.org/api/v1/", "limit": 100}} | 2021-02-08 22:02:13.831184+00 | 1 day            | next_run_not_scheduled | recurring |            0 |
(2 rows)

softwareheritage-scheduler=> select * from task where type like 'list-gitea-full';
    id     |      type       |                                   arguments                                   |           next_run            | current_interval |         status         |  policy   | retries_left | priority
-----------+-----------------+-------------------------------------------------------------------------------+-------------------------------+------------------+------------------------+-----------+--------------+----------
 337306005 | list-gitea-full | {"args": [], "kwargs": {"url": "https://codeberg.org/api/v1/", "limit": 100}} | 2021-03-17 20:43:49.884824+00 | 90 days          | next_run_not_scheduled | recurring |            0 |
(1 row)

softwareheritage-scheduler=> update task set status='disabled' where id in (337348713, 337315168);
UPDATE 2

Update the existing full listing to actual supported arguments:

softwareheritage-scheduler=> update task set arguments='{"args": [], "kwargs": {"url": "https://codeberg.org/api/v1/"}}', current_interval='64 days', next_run=now() where id=337306005;
UPDATE 1

And checks:

Feb 08 08:08:52 worker09 python3[824]: [2021-02-08 08:08:52,981: INFO/ForkPoolWorker-4] Task swh.lister.gitea.tasks.FullGiteaRelister[d59b16c1-3736-4920-ad6e-6c3db5fcbd04] succeeded in 19.510299087967724s: {'pages': 123, 'origins': 6112}

Reschedule the second instance as full listing (only 1 existed):

swhscheduler@saatchi:~$ swh scheduler --config-file /etc/softwareheritage/scheduler/backend.yml task add list-gitea-full url=https://git.fsfe.org/api/v1/
Created 1 tasks

Task 359190316
  Next run: today (2021-02-08T08:11:06.394708+00:00)
  Interval: 90 days, 0:00:00
  Type: list-gitea-full
  Policy: recurring
  Args:
  Keyword args:
    url: 'https://git.fsfe.org/api/v1/'

And checks:

Feb 08 08:11:10 worker12 python3[829]: [2021-02-08 08:11:10,450: INFO/ForkPoolWorker-4] Task swh.lister.gitea.tasks.FullGiteaRelister[44af2234-85f7-4271-872a-7804a0a49eb8] succeeded in 1.8045411680359393s: {'pages': 8, 'origins': 377}

Deployed python3-swh.lister v0.9.0 which holds among other things, an improvment on date parsing for the cran lister.

For the debian lister, after updating its task input, i got [1], but i don't really see what to do with it yet:

[1] https://sentry.softwareheritage.org/share/issue/7e22ced788c7443093144d69f5707b5e/

The current lister implementations work for the main debian distribution but cannot work for the "debian-security" instance currently.

From the current lister implementation:

base_url = urljoin(self.url, f"dists/{suite}/{component}/source/Sources")

we'd need, for that specific instance:

base_url = urljoin(self.url, f"dists/{suite}/updates/{component}/source/Sources")

see: http://deb.debian.org/debian-security/dists/buster/

For the debian lister, after updating its task input, i got [1], but i don't really what to do with it yet:

[1] https://sentry.softwareheritage.org/share/issue/7e22ced788c7443093144d69f5707b5e/

The current lister implementations work for the main debian distribution but cannot work for the "debian-security" instance currently.

From the current lister implementation:

base_url = urljoin(self.url, f"dists/{suite}/{component}/source/Sources")

we'd need, for that specific instance:

base_url = urljoin(self.url, f"dists/{suite}/updates/{component}/source/Sources")

see: http://deb.debian.org/debian-security/dists/buster/

Two possible solutions here:

  • use stretch/updates, buster/updates, ... as suite names
  • add specific URL template processing in debian lister implementation

I would go for the second one.

Two possible solutions here:

  • use stretch/updates, buster/updates, ... as suite names
  • add specific URL template processing in debian lister implementation

Right

I would go for the second one

sounds good.

Will you attend to it or should i (i can't tell from your suggestion :)?

Will you attend to it or should i (i can't tell from your suggestion :)?

I handle it, diff incoming.

Will you attend to it or should i (i can't tell from your suggestion :)?

I handle it, diff incoming.

D5037

Tagged and deployed v0.9.1

Restarted.

And updated the debian-security instance:

softwareheritage-scheduler=> update task set arguments='{"args": [], "kwargs": {"suites": ["stretch", "buster", "bullseye-security"], "components": ["main", "contrib", "non-free"], "mirror_url": "http://deb.debian.org/debian-security/", "d
istribution": "Debian-Security"}}', status='next_run_not_scheduled', next_run=now() where id=65911219;
UPDATE 1

which finally ran ok:

Feb 08 13:59:10 worker11 python3[338237]: [2021-02-08 13:59:10,916: INFO/ForkPoolWorker-4] Task swh.lister.debian.tasks.DebianListerTask[a4936a77-7137-46a9-91a8-56c98abb66be] succeeded in 2.1607892010360956s: {'pages': 9, 'origins': 0}
ardumont changed the task status from Open to Work in Progress.Feb 8 2021, 3:04 PM
ardumont updated the task description. (Show Details)
ardumont moved this task from Backlog to Weekly backlog on the System administration board.
ardumont moved this task from Weekly backlog to in-progress on the System administration board.
ardumont moved this task from in-progress to deployed/landed on the System administration board.
ardumont claimed this task.
ardumont moved this task from deployed/landed to done on the System administration board.

Two possible solutions here:

  • use stretch/updates, buster/updates, ... as suite names
  • add specific URL template processing in debian lister implementation

I would go for the second one.

After reading this code, and wondering for the rationale behind it, I don't think this choice is very consistent. The suite names in the security.debian.org repository really are stretch/updates and buster/updates, and we should use them consistently. Overall, I think we should ensure that our "branch schema" (for lack of a better term) matches the contents of what would be in a valid sources.list file.