Page MenuHomeSoftware Heritage

Fix parsing of the Description field in PKG-INFO.
ClosedPublic

Authored by vlorentz on Jan 17 2019, 3:00 PM.

Details

Summary

Before this commit, the policy used to parse PKG-INFO was
email.policy.compat32 (compatibility with Python 3.2 behavior),
which is deprecated. In addition with being deprecated, it caused
crashes on UTF-8 characters because when those are seen, header values
are of a different type that we didn't handle.

So the first step was switching to email.policy.SMTP.
Unfortunately, the PKG-INFO format assumes newlines are preserved
when parsing, whereas email.policy.SMTP ignores them, so I added
a new policy derived from it, which preserves newlines.

This is similar to what the pkginfo package does, but cleaner:
https://bazaar.launchpad.net/~tseaver/pkginfo/trunk/view/head:/pkginfo/distribution.py#L14

Diff Detail

Repository
rDCIDX Metadata indexer
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

This revision is now accepted and ready to land.Jan 25 2019, 2:36 PM
This revision was automatically updated to reflect the committed changes.