HomeSoftware Heritage

Fix parsing of the Description field in PKG-INFO.

Description

Fix parsing of the Description field in PKG-INFO.

Before this commit, the policy used to parse PKG-INFO was
email.policy.compat32 (compatibility with Python 3.2 behavior),
which is deprecated. In addition with being deprecated, it caused
crashes on UTF-8 characters because when those are seen, header values
are of a different type that we didn't handle.

So the first step was switching to email.policy.SMTP.
Unfortunately, the PKG-INFO format assumes newlines are preserved
when parsing, whereas email.policy.SMTP ignores them, so I added
a new policy derived from it, which preserves newlines.

This is similar to what the pkginfo package does, but cleaner:
https://bazaar.launchpad.net/~tseaver/pkginfo/trunk/view/head:/pkginfo/distribution.py#L14

Details

Provenance
vlorentzAuthored on Jan 17 2019, 2:54 PM
vlorentzPushed on Jan 25 2019, 3:49 PM
Differential Revision
D971: Fix parsing of the Description field in PKG-INFO.
Parents
rDCIDX99664da72219: Simplify a bit unit tests
Branches
Unknown
Tags
Unknown
Build Status
Buildable 3733
Build 4870: test-and-build