HomeSoftware Heritage

browse/utils: Robustify content encoding detection

Description

browse/utils: Robustify content encoding detection

When attempting to re-encode non UTF-8 textual content, use chardet
to find the encoding first and use it if the detection confidence
is really high.

Previously some encoding like SHIFT_JIS (for japanese language) were
not correctly detected and thus content were badly rendered in the
browse Web UI.

Details

Provenance
anlambertAuthored on Feb 17 2022, 4:57 PM
anlambertPushed on Feb 18 2022, 11:06 AM
Differential Revision
D7197: browse/utils: Robustify content encoding detection
Parents
rDWAPPSd858c9b457d3: api_origin_search: Copy all params to build 'link-next'.
Branches
Unknown
Tags
Unknown
Build Status
Buildable 26978
Build 42182: test-and-buildJenkins console · Jenkins