Page MenuHomeSoftware Heritage

Deploy nixguix loader
Closed, ResolvedPublic

Description

Even though the implementation is not complete [1], we can deploy a first version
of the loader which can already ingest the majority of artifacts.

This will most likely creates partial snapshot most of the time but that's
still snapshot that are browsable from the archive.

We can thus deploy it.

This task tracks the actions to do so.

Note: striken sources are deactivated for now (seen with @lewo).

[1] As the parent task mentions, for now, some artifacts are not dealt with (they
have been mostly filtered out in the sources.json for now)

[2] Idempotently register new task-types (out of new version of loader
packages, noop on existing ones)

SWH_CONFIG_FILENAME=/etc/softwareheritage/scheduler.yml
swh scheduler --config-file $SWH_CONFIG_FILENAME \
  task-type register \
    --plugins loader.nixguix

[3] Scheduling task (on the right scheduler)

swh scheduler --config-file $SWH_CONFIG_FILENAME \
  task add load-nixguix \
    url=https://nix-community.github.io/nixpkgs-swh/sources-unstable.json

Event Timeline

ardumont triaged this task as Normal priority.Tue, May 19, 11:54 AM
ardumont created this task.
ardumont updated the task description. (Show Details)Tue, May 19, 11:56 AM
ardumont updated the task description. (Show Details)Tue, May 19, 12:02 PM
ardumont added a subscriber: lewo.
ardumont updated the task description. (Show Details)Tue, May 19, 12:16 PM
ardumont updated the task description. (Show Details)
ardumont updated the task description. (Show Details)
ardumont updated the task description. (Show Details)Tue, May 19, 12:25 PM
ardumont updated the task description. (Show Details)Tue, May 19, 12:33 PM
ardumont updated the task description. (Show Details)Wed, May 20, 10:03 AM
ardumont updated the task description. (Show Details)Wed, May 20, 10:10 AM
ardumont updated the task description. (Show Details)Wed, May 20, 10:13 AM
ardumont updated the task description. (Show Details)
ardumont changed the task status from Open to Work in Progress.Wed, May 20, 10:18 AM

Deployed.

Started only 1 worker (01). The first run may take more than 1 day. And the
frequency is 1 day.

If a new worker starts without a first finished snapshot, that would not
benefit from the incremental nature of the loader.

I will crank up the parallelism to all workers when the first pass is done.

swh scheduler --config-file $SWH_CONFIG_FILENAME task add load-nixguix url=https://nix-community.github.io/nixpkgs-swh/sources-unstable.json
INFO:swh.core.config:Loading config file /etc/softwareheritage/global.ini
INFO:swh.core.config:Loading config file /etc/softwareheritage/scheduler.yml
[INFO] swh.core.config -- Loading config file /etc/softwareheritage/scheduler.yml
Created 1 tasks

Task 334411727
  Next run: just now (2020-05-20 08:13:26+00:00)
  Interval: 1 day, 0:00:00
  Type: load-nixguix
  Policy: recurring
  Args:
  Keyword args:
    url: 'https://nix-community.github.io/nixpkgs-swh/sources-unstable.json'

Still running so more than a day indeed.

ardumont added a comment.EditedFri, May 22, 9:17 AM

And the first run crashed...

May 22 03:38:53 worker01 python3[275651]: Process 'ForkPoolWorker-1' pid:275685 exited with 'signal 9 (SIGKILL)'                                                                                                                               May 22 03:38:54 worker01 python3[275651]: [2020-05-22 03:38:54,044: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).')
                                          Traceback (most recent call last):
                                            File "/usr/lib/python3/dist-packages/billiard/pool.py", line 1267, in mark_as_worker_lost
                                              human_status(exitcode)),
                                          billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).

I just don't have the will to look into it right now. I rescheduled one in the
mean time.

Heads up.

The second iteration broke with the same error again... (worker service is
down, it needs to be restarted manually, jsyk).

May 23 00:00:44 worker01 python3[404365]: Process 'ForkPoolWorker-1' pid:404369 exited with 'signal 9 (SIGKILL)'
May 23 00:00:44 worker01 python3[404365]: [2020-05-23 00:00:44,390: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).')
                                          Traceback (most recent call last):
                                            File "/usr/lib/python3/dist-packages/billiard/pool.py", line 1267, in mark_as_worker_lost
                                              human_status(exitcode)),
                                          billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).

I did not find a specific nixguix issue on sentry. I found one tagged on git
loader, the stack trace (celery/billiard related) match though [1]

So that seems to be a more generic issue than raw nixguix.

[1] https://sentry.softwareheritage.org/share/issue/efaee88e5e154554846d0f428adf4aa1/

As a workaround and tryout (unsure if it would have been enough), I stopped
other service workers and rescheduled one. And the run did finish this time
[1]

Its browsable snapshot [2]

[1]

May 24 05:21:57 worker01 python3[481776]: [2020-05-24 05:21:57,329: INFO/ForkPoolWorker-1] Task swh.loader.package.nixguix.tasks.LoadNixguix[fb395d38-41aa-49f0-a682-569b9e2458cd] succeeded in 69021.774671264s: {'status': 'eventful', 'snapshot_id': 'b019cd30357faf2028d82589811f083f51d6eeb2'}

[2] https://archive.softwareheritage.org/swh:1:snp:b019cd30357faf2028d82589811f083f51d6eeb2;origin=https://nix-community.github.io/nixpkgs-swh/sources-unstable.json/

ardumont added a comment.EditedMon, May 25, 11:32 AM

And the next visit run got hit by T2371.

May 25 05:22:46 worker01 python3[538698]: [2020-05-25 05:22:46,296: ERROR/ForkPoolWorker-2] Task swh.loader.package.nixguix.tasks.LoadNixguix[3188eeae-2a8c-4093-99f5-b1acb870ce2c] raised unexpected: KeyError('extrinsic')
                                          Traceback (most recent call last):
                                            File "/usr/lib/python3/dist-packages/celery/app/trace.py", line 382, in trace_task
                                              R = retval = fun(*args, **kwargs)
                                            File "/usr/lib/python3/dist-packages/swh/scheduler/task.py", line 51, in __call__
                                              result = super().__call__(*args, **kwargs)
                                            File "/usr/lib/python3/dist-packages/celery/app/trace.py", line 641, in __protected_call__
                                              return self.run(*args, **kwargs)
                                            File "/usr/lib/python3/dist-packages/sentry_sdk/integrations/celery.py", line 161, in _inner
                                              reraise(*exc_info)
                                            File "/usr/lib/python3/dist-packages/sentry_sdk/_compat.py", line 57, in reraise
                                              raise value
                                            File "/usr/lib/python3/dist-packages/sentry_sdk/integrations/celery.py", line 156, in _inner
                                              return f(*args, **kwargs)
                                            File "/usr/lib/python3/dist-packages/swh/loader/package/nixguix/tasks.py", line 14, in load_nixguix
                                              return NixGuixLoader(url).load()
                                            File "/usr/lib/python3/dist-packages/swh/loader/package/loader.py", line 328, in load
                                              revision_id = self.resolve_revision_from(known_artifacts, p_info["raw"])
                                            File "/usr/lib/python3/dist-packages/swh/loader/package/nixguix/loader.py", line 70, in resolve_revision_from
                                              known_integrity = known_artifact["extrinsic"]["raw"]["integrity"]
                                          KeyError: 'extrinsic'

So, in the end, we will need D2949 after all.

Even simpler than D2949, filtering out the culprit branch "evaluation" (targetting nixpkgs) and we are back on track.

and done:

May 25 17:17:10 worker01 python3[603377]: [2020-05-25 17:17:10,297: INFO/ForkPoolWorker-1] Task swh.loader.package.nixguix.tasks.LoadNixguix[5be5167d-3783-4fc9-b18e-116424949d0a] succeeded in 3153.633277214016s: {'status': 'eventful', 'snapshot_id': '76f38fb09efe9f10305c50f5cf083b008217256f'}
ardumont closed this task as Resolved.Tue, May 26, 2:52 PM