Page MenuHomeSoftware Heritage

nixguix: fails to use previous visit snapshot
Closed, MigratedEdits Locked

Description

The next visits of nixguix loaders fails with the following.
loader-nixguix version 0.0.90 [4]

Apr 22 07:36:21 worker0 python3[26357]:   File "/usr/lib/python3/dist-packages/swh/loader/package/nixguix/tasks.py", line 14, in load_nixguix
Apr 22 07:36:21 worker0 python3[26357]:     return NixGuixLoader(url).load()
Apr 22 07:36:21 worker0 python3[26357]:   File "/usr/lib/python3/dist-packages/swh/loader/package/loader.py", line 328, in load
Apr 22 07:36:21 worker0 python3[26357]:     revision_id = self.resolve_revision_from(known_artifacts, p_info["raw"])
Apr 22 07:36:21 worker0 python3[26357]:   File "/usr/lib/python3/dist-packages/swh/loader/package/nixguix/loader.py", line 136, in resolve_revision_from
Apr 22 07:36:21 worker0 python3[26357]:     known_integrity = known_artifact["extrinsic"]["raw"]["integrity"]
Apr 22 07:36:21 worker0 python3[26357]: KeyError: 'integrity'
Apr 22 07:36:21 worker0 python3[26362]: [2020-04-22 07:36:21,942: ERROR/ForkPoolWorker-1] Task swh.loader.package.nixguix.tasks.LoadNixguix[7d942ed8-d811-4ffe-8971-d1d8fee0a3f7] raised unexpected: KeyError('integrity')
                                        Traceback (most recent call last):
                                          File "/usr/lib/python3/dist-packages/celery/app/trace.py", line 382, in trace_task
                                            R = retval = fun(*args, **kwargs)
                                          File "/usr/lib/python3/dist-packages/swh/scheduler/task.py", line 53, in __call__
                                            result = super().__call__(*args, **kwargs)
                                          File "/usr/lib/python3/dist-packages/celery/app/trace.py", line 641, in __protected_call__
                                            return self.run(*args, **kwargs)
                                          File "/usr/lib/python3/dist-packages/sentry_sdk/integrations/celery.py", line 161, in _inner
                                            reraise(*exc_info)
                                          File "/usr/lib/python3/dist-packages/sentry_sdk/_compat.py", line 57, in reraise
                                            raise value
                                          File "/usr/lib/python3/dist-packages/sentry_sdk/integrations/celery.py", line 156, in _inner
                                            return f(*args, **kwargs)
                                          File "/usr/lib/python3/dist-packages/swh/loader/package/nixguix/tasks.py", line 14, in load_nixguix
                                            return NixGuixLoader(url).load()
                                          File "/usr/lib/python3/dist-packages/swh/loader/package/loader.py", line 328, in load
                                            revision_id = self.resolve_revision_from(known_artifacts, p_info["raw"])
                                          File "/usr/lib/python3/dist-packages/swh/loader/package/nixguix/loader.py", line 136, in resolve_revision_from
                                            known_integrity = known_artifact["extrinsic"]["raw"]["integrity"]
                                        KeyError: 'integrity'

Checking the sources-unstable.json used by the loader, nothing is amiss there
[1].

Checking the snapshot branches metadata field, those are incomplete. They are
missing the integrity field indeed [2] [3]

[1]

$ wget https://nix-community.github.io/nixpkgs-swh/sources-unstable.json
$ ipython
In [2]: import json

In [3]: with open('sources-unstable.json') as f:
   ...:     data = json.load(f)
   ...:

In [4]: missing_integrity=[]
   ...: for s in data:
   ...:     if isinstance(s, dict) and 'integrity' not in s:
   ...:         missing_integrity.append(s)
   ...:

In [5]: missing_integrity
Out[5]: []

[2] https://webapp.internal.staging.swh.network/api/1/revision/edd09b01c9378ef8448e91762a3500cde908d17e/?fields=metadata

[3] https://forge.softwareheritage.org/P651

[4]

$ dpkg -l python3-swh.loader.core
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                    Version               Architecture Description
+++-=======================-=====================-============-=================================
ii  python3-swh.loader.core 0.0.90-1~swh1~bpo10+1 all          Software Heritage Loader Core

Event Timeline

ardumont triaged this task as Normal priority.Apr 22 2020, 10:49 AM
ardumont created this task.

Something is off though.

According to the code, it should be set alright...

I was pretty sure it was the case prior to opening the task.

I'm even more sure now. I added tests to retrieve the revisions from the
snapshot, and check those ensuring they do have those fields url and integrity.
Those checks are green...

¯\_(ツ)_/¯

Note: the snapshot is a brand new one, no shared state with a prior one.

Which made me realize that it's true for the snapshot, not necessarily for the revisions...
That must be it.

The revisions in question were most probably already loaded with prior runs on anterior nixguix loader version. So the metadata there was set. And now, it can never change to the new format.

The only way to ensure this is to clear the staging infra state (storage0) and start from scratch...

tl; dr: That issue must be invalid. I'm ensuring it is. I've triggered the
necessary checks and waiting for the conclusion.

The only way to ensure this is to clear the staging infra state (storage0)
and start from scratch...

Clean up done.
Trigger a new run of the nixguix loader.

ardumont changed the task status from Open to Work in Progress.Apr 22 2020, 12:17 PM

The revisions in question were most probably already loaded with prior runs on
anterior nixguix loader version. So the metadata there was set. And now, it can
never change to the new format.

tl; dr: That issue must be invalid. I'm ensuring it is. I've triggered the
necessary checks and waiting for the conclusion.

My hypothesis was right. Closing.

Related to the caveats described in T2306

ardumont changed the task status from Resolved to Invalid.

It's not invalid in the end, we hit it in production now.

It's not invalid in the end, we hit it in production now.

I was too prompt on tagging this. It's not entirely the same issue as T2411.

In this, the revision is actually having a metadata issue on "intrinsic" field
deep within the metadata structure. Which means, it's indeed a revision loaded
from a prior run (with a previous nixguix version -> only metadata field
changed).

The proper fix is D2949 (to ensure we have proper logs if that happens).

In production, the issue is about the first level key extrinsic which got track
to be about the evaluation branch created by the same loader. As that branch is
a special case of targetting nixpkgs, this does not have the same and expected
metadata the other revisions do. Thus crashing the loader when this one tries
to resolve from a snapshot the already seen artifacts (by looking into the
metadata).

The proper fix is D3178.

ardumont changed the task status from Open to Work in Progress.May 26 2020, 12:07 PM
ardumont claimed this task.
ardumont added a subscriber: lewo.

D2949 updated accordingly

Landed, Deployed.

If that ever happens again (I don't expect it but I have been wrong in the
past), we will have a logged exception with some context.