Ensure package.json file can be parsed when its content can not be properly decoded
due to the encoding not properly detected.
So try to decode from utf-8 first, then use chardet as a fallback using the replace error hanling to replace characters that can not be decoded.
Even if the package.json content can not be correctly loaded, this is not critical
as these data are only added to a swh revision metadata. Original package.json file
can still be obtained from the archive content.
This should fix this kind of reported errors:
Traceback (most recent call last): File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 895, in load more_data_to_fetch = self.fetch_data() File "/usr/lib/python3/dist-packages/swh/loader/npm/loader.py", line 203, in fetch_data data = next(self.new_versions) File "/usr/lib/python3/dist-packages/swh/loader/npm/client.py", line 149, in prepare_package_versions version_data) File "/usr/lib/python3/dist-packages/swh/loader/npm/client.py", line 207, in _prepare_package_version package_json = json.loads(package_json_bytes.decode(file_encoding)) File "/usr/lib/python3.5/encodings/cp1254.py", line 15, in decode return codecs.charmap_decode(input,errors,decoding_table) UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 42: character maps to <undefined>
Traceback (most recent call last): File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 895, in load more_data_to_fetch = self.fetch_data() File "/usr/lib/python3/dist-packages/swh/loader/npm/loader.py", line 203, in fetch_data data = next(self.new_versions) File "/usr/lib/python3/dist-packages/swh/loader/npm/client.py", line 145, in prepare_package_versions version_data) File "/usr/lib/python3/dist-packages/swh/loader/npm/client.py", line 197, in _prepare_package_version package_json = json.load(package_json_file) File "/usr/lib/python3.5/json/__init__.py", line 268, in load parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw) File "/usr/lib/python3.5/json/__init__.py", line 315, in loads s, 0) json.decoder.JSONDecodeError: Unexpected UTF-8 BOM (decode using utf-8-sig): line 1 column 1 (char 0)
Traceback (most recent call last): File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 895, in load more_data_to_fetch = self.fetch_data() File "/usr/lib/python3/dist-packages/swh/loader/npm/loader.py", line 203, in fetch_data data = next(self.new_versions) File "/usr/lib/python3/dist-packages/swh/loader/npm/client.py", line 149, in prepare_package_versions version_data) File "/usr/lib/python3/dist-packages/swh/loader/npm/client.py", line 204, in _prepare_package_version with open(package_json_path, 'rb') as package_json_file: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/swh.loader.npm/swh.loader.npm.jrx67u3_-2344/@lpmraven/link-components/0.1.1/package/package.json'
Related T1726