HomeSoftware Heritage

Fix loading of CVS repositories with non valid UTF-8 paths

Description

Fix loading of CVS repositories with non valid UTF-8 paths

Some CVS repositories have paths which are non valid UTF-8 (typically
ISO-8859-1 ones) but the loader implementation assumed all paths can
be safely encoded to UTF-8 and was raising UnicodeEncodeError when
attempting to encode non UTF-8 paths.

That commit modifies the way CVS paths are handled by the loader by
using their raw bytes representation instead of their UTF-8 decoded
string representation.

Also rcsparse.rcsfile constructor has been modified to take bytes path
as argument instead of an unicode one in order to be able to successfully
open non UTF-8 paths.

Such CVS repositories can now be successfully loaded, either using rsync
or pserver protocol.

Related to T3980

Details

Provenance
anlambertAuthored on Jul 1 2022, 12:15 PM
anlambertPushed on Jul 7 2022, 10:42 AM
Differential Revision
D8086: Fix loading of CVS repositories with non valid UTF-8 paths
Parents
rDLDCVSb35a9769a035: cvsclient: Retry pserver connection three times in case of failure
Branches
Unknown
Tags
Unknown
Tasks
Restricted Maniphest Task
Build Status
Buildable 30290
Build 47355: test-and-buildJenkins console · Jenkins