Page MenuHomeSoftware Heritage

swh.core.tarballl.uncompress: Some zip files fail to be uncompressed
Closed, MigratedEdits Locked

Description

swh.core.tarball.uncompress uses shutil.unpack_archive in its implementation which relies on the zipfile module to uncompress zip files.

The zipfile module does not support all ZIP compression types, notably type 6 (imploded), see stack trace below encountered during tests in the docker environment:

swh-loader_1                    | [2021-06-08 14:33:44,196: DEBUG/ForkPoolWorker-1] package_info: ArchivePackageInfo(url='http://www.columbia.edu/kermit/ftp/archives/msk316src.zip', filename='msk316src.zip', directory_extrinsic_metadata=[], raw_info={'url': 'http://www.columbia.edu/kermit/ftp/archives/msk316src.zip', 'time': '2003-09-23T17:01:07', 'length': 845917, 'version': 'msk316src'}, length=845917, time='2003-09-23T17:01:07', version='msk316src')
swh-loader_1                    | [2021-06-08 14:33:44,372: DEBUG/ForkPoolWorker-1] filename: msk316src.zip
swh-loader_1                    | [2021-06-08 14:33:44,372: DEBUG/ForkPoolWorker-1] filepath: /tmp/tmpfxajgaxl/msk316src.zip
swh-loader_1                    | [2021-06-08 14:33:45,338: DEBUG/ForkPoolWorker-1] extrinsic_metadata
swh-loader_1                    | [2021-06-08 14:33:45,341: ERROR/ForkPoolWorker-1] Failed loading branch releases/msk316src for https://www.kermitproject.org/archive.html
swh-loader_1                    | Traceback (most recent call last):
swh-loader_1                    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/package/loader.py", line 576, in load
swh-loader_1                    |     res = self._load_revision(p_info, origin)
swh-loader_1                    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/package/loader.py", line 715, in _load_revision
swh-loader_1                    |     (uncompressed_path, directory) = self._load_directory(dl_artifacts, tmpdir)
swh-loader_1                    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/package/loader.py", line 678, in _load_directory
swh-loader_1                    |     uncompressed_path = self.uncompress(dl_artifacts, dest=tmpdir)
swh-loader_1                    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/package/loader.py", line 377, in uncompress
swh-loader_1                    |     uncompress(a_path, dest=uncompressed_path)
swh-loader_1                    |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/tarball.py", line 72, in uncompress
swh-loader_1                    |     shutil.unpack_archive(tarpath, extract_dir=dest)
swh-loader_1                    |   File "/usr/local/lib/python3.7/shutil.py", line 1002, in unpack_archive
swh-loader_1                    |     func(filename, extract_dir, **kwargs)
swh-loader_1                    |   File "/usr/local/lib/python3.7/shutil.py", line 917, in _unpack_zipfile
swh-loader_1                    |     data = zip.read(info.filename)
swh-loader_1                    |   File "/usr/local/lib/python3.7/zipfile.py", line 1464, in read
swh-loader_1                    |     with self.open(name, "r", pwd) as fp:
swh-loader_1                    |   File "/usr/local/lib/python3.7/zipfile.py", line 1560, in open
swh-loader_1                    |     return ZipExtFile(zef_file, mode, zinfo, pwd, True)
swh-loader_1                    |   File "/usr/local/lib/python3.7/zipfile.py", line 809, in __init__
swh-loader_1                    |     self._decompressor = _get_decompressor(self._compress_type)
swh-loader_1                    |   File "/usr/local/lib/python3.7/zipfile.py", line 722, in _get_decompressor
swh-loader_1                    |     raise NotImplementedError("compression type %d (%s)" % (compress_type, descr))
swh-loader_1                    | NotImplementedError: compression type 6 (implode)

One possible workaround would be to delegate the zip file uncompression to the unzip command when such an error occurs.