Page MenuHomeSoftware Heritage

debian.loader: Improve resolution computation and fix corner cases

Authored by ardumont on Dec 19 2019, 3:43 PM.



+ loader.debian: Make sense of the actual revision resolution algo

Test Plan


Diff Detail

rDLDBASE Generic VCS/Package Loader
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ardumont retitled this revision from debian.loader: Deal with no revision resolution possible to debian.loader: Deal with no revision resolution possible scenario.Dec 19 2019, 3:44 PM
ardumont edited the summary of this revision. (Show Details)
ardumont added a project: Debian loader.
olasd requested changes to this revision.Dec 19 2019, 4:32 PM

The logic in that function is messing me up a little bit, probably because it's missing a docstring explaining what the arguments are. Could you add one?

I'm guessing artifact_metadata is one of the packages you're currently processing, and known_package_artifacts come from the previous snapshot?

I've also suggested a way to make the logic a little bit less mind-bending.


these lines are backwards


It looks like what we want is to match the artifacts of the current package with all packages referenced in the previous snapshot in turn.

I _think_ that the logic in this loop is inverted, but I have a hard time comprehending it. The comments are backwards too :(

The logic would be much clearer if we did the comparison using set logic:

  • build the set of (filename, (sha256, sha1, size, ...)) tuples for the package we're currently trying to load
  • build the set of (filename, (sha256, sha1, size, ...)) tuples for every revision found in the previous snapshot
  • check if these two sets are equal

We could even make these sets ordered by filename, which lets us turn them into (hashable) tuples, which lets us build a dict mapping ((filename, (hash, hash, size)), ...) to revision_ids directly when fetching the previous snapshot, making this function a plain dict lookup.


This try block feels overly broad.

Could you just do

if 'files' not in artifact_metadata:
    return None  # or even artifact_metadata.get('revision_id')

at the beginning of the function instead?

This revision now requires changes to proceed.Dec 19 2019, 4:32 PM

that's not enough

that can explode a bit later.

vlorentz added inline comments.

I agree with @olasd that this try block is too long. There's a high risk of catching unrelated error.


i agree the try is long.
But here, all this shenanigans is to solve a revision from metadata dict from previous revisions (revisions coming from the snapshot).

So it should fail only for that.
We do not do anything else.


this i found backward...


that can explode a bit later

as in:

 8d073f5dc6c8b7a322618c35'}}, 'version': '0.3.9-2', 'revision_id': None}}
Dec 19 14:58:02 worker01 python3[557]: [2019-12-19 14:58:02,443: ERROR/ForkPoolWorker-1] Fail to load deb://Debian/packages/freshplayerplugin
                                       Traceback (most recent call last):
                                         File "/usr/lib/python3/dist-packages/swh/loader/package/", line 286, in load
                                           known_artifacts, p_info['raw'])
                                         File "/usr/lib/python3/dist-packages/swh/loader/package/debian/", line 104, in resolve_revision_from
                                           known_artifacts = known_artifacts['extrinsic']['raw']['files']
                                       KeyError: 'extrinsic'

One more argument for parsing these only once when fetching the old snapshot (see my other inline comment). ;)


docstring is in the main class [1]

i do not know how we can deal with that properly without replicating stuff.


  • debian.loader: Extract revision resolution into a function (+ add dedicated tests)
  • debian.loader: Improve and fix revision resolution
  • Remove unneeded instruction
  • shuffle the keys a bit to ensure order does not matter
ardumont retitled this revision from debian.loader: Deal with no revision resolution possible scenario to debian.loader: Improve resolution computation and fix corner cases.Dec 20 2019, 9:49 AM
This revision is now accepted and ready to land.Dec 20 2019, 11:26 AM