It exists cases where a tarball to dowload is marked as gzipped in the
Content-Encoding HTTP response header while in fact it is not.
So handle ContentDecodingError exception that can be raised by the
dowload method: try to download tarball raw bytes again without
attempting to uncompress the input stream.
Real word example encountered:
swh-loader_1 | [2021-06-10 09:18:08,876: DEBUG/ForkPoolWorker-1] package_info: ArchivePackageInfo(url='http://www.columbia.edu/kermit/ftp/archives/cpm80.tar.gz', filename='cpm80.tar.gz', directory_extrinsic_metadata=[], raw_info={'url': 'http://www.columbia.edu/kermit/ftp/archives/cpm80.tar.gz', 'time': '2011-08-13T23:05:09', 'length': 1894400, 'version': 'cpm80'}, length=1894400, time='2011-08-13T23:05:09', version='cpm80') swh-loader_1 | [2021-06-10 09:18:09,039: DEBUG/ForkPoolWorker-1] filename: cpm80.tar.gz swh-loader_1 | [2021-06-10 09:18:09,039: DEBUG/ForkPoolWorker-1] filepath: /tmp/tmpqydd_7xw/cpm80.tar.gz swh-loader_1 | [2021-06-10 09:18:09,044: ERROR/ForkPoolWorker-1] Failed loading branch releases/cpm80 for https://www.kermitproject.org/archive.html swh-loader_1 | Traceback (most recent call last): swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/urllib3/response.py", line 401, in _decode swh-loader_1 | data = self._decoder.decompress(data) swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/urllib3/response.py", line 88, in decompress swh-loader_1 | ret += self._obj.decompress(data) swh-loader_1 | zlib.error: Error -3 while decompressing data: incorrect header check swh-loader_1 | swh-loader_1 | During handling of the above exception, another exception occurred: swh-loader_1 | swh-loader_1 | Traceback (most recent call last): swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/requests/models.py", line 753, in generate swh-loader_1 | for chunk in self.raw.stream(chunk_size, decode_content=True): swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/urllib3/response.py", line 576, in stream swh-loader_1 | data = self.read(amt=amt, decode_content=decode_content) swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/urllib3/response.py", line 548, in read swh-loader_1 | data = self._decode(data, decode_content, flush_decoder) swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/urllib3/response.py", line 407, in _decode swh-loader_1 | e, swh-loader_1 | urllib3.exceptions.DecodeError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check')) swh-loader_1 | swh-loader_1 | During handling of the above exception, another exception occurred: swh-loader_1 | swh-loader_1 | Traceback (most recent call last): swh-loader_1 | File "/src/swh-loader-core/swh/loader/package/loader.py", line 576, in load swh-loader_1 | res = self._load_revision(p_info, origin) swh-loader_1 | File "/src/swh-loader-core/swh/loader/package/loader.py", line 713, in _load_revision swh-loader_1 | dl_artifacts = self.download_package(p_info, tmpdir) swh-loader_1 | File "/src/swh-loader-core/swh/loader/package/loader.py", line 364, in download_package swh-loader_1 | return [download(p_info.url, dest=tmpdir, filename=p_info.filename)] swh-loader_1 | File "/src/swh-loader-core/swh/loader/package/utils.py", line 93, in download swh-loader_1 | for chunk in response.iter_content(chunk_size=HASH_BLOCK_SIZE): swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/requests/models.py", line 758, in generate swh-loader_1 | raise ContentDecodingError(e) swh-loader_1 | requests.exceptions.ContentDecodingError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))