Page MenuHomeSoftware Heritage

debian.loader: Improve resolution computation and fix corner cases
ClosedPublic

Authored by ardumont on Dec 19 2019, 3:43 PM.

Details

Summary

+ loader.debian: Make sense of the actual revision resolution algo

Test Plan

tox

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 9904
Build 14624: tox-on-jenkinsJenkins
Build 14623: arc lint + arc unit

Event Timeline

ardumont retitled this revision from debian.loader: Deal with no revision resolution possible to debian.loader: Deal with no revision resolution possible scenario.Dec 19 2019, 3:44 PM
ardumont edited the summary of this revision. (Show Details)
ardumont added a project: Debian loader.
olasd requested changes to this revision.Dec 19 2019, 4:32 PM

The logic in that function is messing me up a little bit, probably because it's missing a docstring explaining what the arguments are. Could you add one?

I'm guessing artifact_metadata is one of the packages you're currently processing, and known_package_artifacts come from the previous snapshot?

I've also suggested a way to make the logic a little bit less mind-bending.

swh/loader/package/debian/loader.py
126–127

these lines are backwards

129–140

It looks like what we want is to match the artifacts of the current package with all packages referenced in the previous snapshot in turn.

I _think_ that the logic in this loop is inverted, but I have a hard time comprehending it. The comments are backwards too :(

The logic would be much clearer if we did the comparison using set logic:

  • build the set of (filename, (sha256, sha1, size, ...)) tuples for the package we're currently trying to load
  • build the set of (filename, (sha256, sha1, size, ...)) tuples for every revision found in the previous snapshot
  • check if these two sets are equal

We could even make these sets ordered by filename, which lets us turn them into (hashable) tuples, which lets us build a dict mapping ((filename, (hash, hash, size)), ...) to revision_ids directly when fetching the previous snapshot, making this function a plain dict lookup.

148

This try block feels overly broad.

Could you just do

if 'files' not in artifact_metadata:
    logger.warning(...)
    return None  # or even artifact_metadata.get('revision_id')

at the beginning of the function instead?

This revision now requires changes to proceed.Dec 19 2019, 4:32 PM
swh/loader/package/debian/loader.py
148

that's not enough

that can explode a bit later.

vlorentz added inline comments.
swh/loader/package/debian/loader.py
148

I agree with @olasd that this try block is too long. There's a high risk of catching unrelated error.

swh/loader/package/debian/loader.py
148

i agree the try is long.
But here, all this shenanigans is to solve a revision from metadata dict from previous revisions (revisions coming from the snapshot).

So it should fail only for that.
We do not do anything else.

swh/loader/package/debian/loader.py
101

this i found backward...

148

that can explode a bit later

as in:

 8d073f5dc6c8b7a322618c35'}}, 'version': '0.3.9-2', 'revision_id': None}}
Dec 19 14:58:02 worker01 python3[557]: [2019-12-19 14:58:02,443: ERROR/ForkPoolWorker-1] Fail to load deb://Debian/packages/freshplayerplugin
                                       Traceback (most recent call last):
                                         File "/usr/lib/python3/dist-packages/swh/loader/package/loader.py", line 286, in load
                                           known_artifacts, p_info['raw'])
                                         File "/usr/lib/python3/dist-packages/swh/loader/package/debian/loader.py", line 104, in resolve_revision_from
                                           known_artifacts = known_artifacts['extrinsic']['raw']['files']
                                       KeyError: 'extrinsic'
swh/loader/package/debian/loader.py
148

One more argument for parsing these only once when fetching the old snapshot (see my other inline comment). ;)

swh/loader/package/debian/loader.py
94–120

docstring is in the main class [1]

i do not know how we can deal with that properly without replicating stuff.

[1] https://forge.softwareheritage.org/source/swh-loader-core/browse/master/swh/loader/package/loader.py$0-158

  • debian.loader: Extract revision resolution into a function (+ add dedicated tests)
  • debian.loader: Improve and fix revision resolution
  • Remove unneeded instruction
  • shuffle the keys a bit to ensure order does not matter
ardumont retitled this revision from debian.loader: Deal with no revision resolution possible scenario to debian.loader: Improve resolution computation and fix corner cases.Dec 20 2019, 9:49 AM
This revision is now accepted and ready to land.Dec 20 2019, 11:26 AM