+ loader.debian: Make sense of the actual revision resolution algo
Details
Diff Detail
- Repository
- rDLDBASE Generic VCS/Package Loader
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
Build is green
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/283/ for more details.
Build is green
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/284/ for more details.
Build is green
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/285/ for more details.
The logic in that function is messing me up a little bit, probably because it's missing a docstring explaining what the arguments are. Could you add one?
I'm guessing artifact_metadata is one of the packages you're currently processing, and known_package_artifacts come from the previous snapshot?
I've also suggested a way to make the logic a little bit less mind-bending.
swh/loader/package/debian/loader.py | ||
---|---|---|
99–100 | these lines are backwards | |
102–113 | It looks like what we want is to match the artifacts of the current package with all packages referenced in the previous snapshot in turn. I _think_ that the logic in this loop is inverted, but I have a hard time comprehending it. The comments are backwards too :( The logic would be much clearer if we did the comparison using set logic:
We could even make these sets ordered by filename, which lets us turn them into (hashable) tuples, which lets us build a dict mapping ((filename, (hash, hash, size)), ...) to revision_ids directly when fetching the previous snapshot, making this function a plain dict lookup. | |
121 | This try block feels overly broad. Could you just do if 'files' not in artifact_metadata: logger.warning(...) return None # or even artifact_metadata.get('revision_id') at the beginning of the function instead? |
swh/loader/package/debian/loader.py | ||
---|---|---|
121 | that's not enough that can explode a bit later. |
swh/loader/package/debian/loader.py | ||
---|---|---|
121 | i agree the try is long. So it should fail only for that. |
swh/loader/package/debian/loader.py | ||
---|---|---|
101 | this i found backward... | |
121 |
as in: 8d073f5dc6c8b7a322618c35'}}, 'version': '0.3.9-2', 'revision_id': None}} Dec 19 14:58:02 worker01 python3[557]: [2019-12-19 14:58:02,443: ERROR/ForkPoolWorker-1] Fail to load deb://Debian/packages/freshplayerplugin Traceback (most recent call last): File "/usr/lib/python3/dist-packages/swh/loader/package/loader.py", line 286, in load known_artifacts, p_info['raw']) File "/usr/lib/python3/dist-packages/swh/loader/package/debian/loader.py", line 104, in resolve_revision_from known_artifacts = known_artifacts['extrinsic']['raw']['files'] KeyError: 'extrinsic' |
swh/loader/package/debian/loader.py | ||
---|---|---|
121 | One more argument for parsing these only once when fetching the old snapshot (see my other inline comment). ;) |
swh/loader/package/debian/loader.py | ||
---|---|---|
94–95 | docstring is in the main class [1] i do not know how we can deal with that properly without replicating stuff. |
- debian.loader: Extract revision resolution into a function (+ add dedicated tests)
- debian.loader: Improve and fix revision resolution
Build is green
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/286/ for more details.
Build is green
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/287/ for more details.
Build is green
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/288/ for more details.
Build is green
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/289/ for more details.
Build is green
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tox/292/ for more details.