Page MenuHomeSoftware Heritage

tarball: Try to get archive format before unpacking it
ClosedPublic

Authored by anlambert on Sep 15 2021, 12:09 PM.

Details

Summary

By default shutil.unpack_archive will try to match the archive format
from its extension but it does not do it in a case insensitive way.
Thus trying to extract a file named "archive.ZIP" will fail.

So try to detect archive format in a case insensitive way by processing
the data returned by shutil.get_unpack_formats prior unpacking.

This should fix those kind of reported sentry issue.

Diff Detail

Repository
rDCORE Foundations and core functionalities
Branch
tarball-get-format
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 23613
Build 36848: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 36847: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D6264 (id=22677)

Rebasing onto 5c81a6c204...

Current branch diff-target is up to date.
Changes applied before test
commit b89340b8768d75ebe33a5cb96d4c046417548b0f
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Sep 15 11:59:47 2021 +0200

    tarball: Try to get archive format before unpacking it
    
    By default shutil.unpack_archive will try to match the archive format
    from its extension but it does not do it in a case insensitive way.
    Thus trying to extract a file named "archive.ZIP" will fail.
    
    So try to detect archive format in a case insensitive way by processing
    the data returned by shutil.get_unpack_formats prior unpacking.

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/240/ for more details.

This revision is now accepted and ready to land.Sep 15 2021, 12:35 PM

Build is green

Patch application report for D6264 (id=22680)

Rebasing onto e45733e9b4...

Current branch diff-target is up to date.
Changes applied before test
commit 632673f6bd1bfe75a103059b59e83bbb209a2c80
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Sep 15 11:59:47 2021 +0200

    tarball: Try to get archive format before unpacking it
    
    By default shutil.unpack_archive will try to match the archive format
    from its extension but it does not do it in a case insensitive way.
    Thus trying to extract a file named "archive.ZIP" will fail.
    
    So try to detect archive format in a case insensitive way by processing
    the data returned by shutil.get_unpack_formats prior unpacking.

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/242/ for more details.