Page MenuHomeSoftware Heritage

debian: Add md5 sum fallback when sha* checksum is missing in metadata
ClosedPublic

Authored by anlambert on Dec 6 2021, 2:35 PM.

Details

Summary

In order to check successful download of a package file, the debian loader
will compare sha256 or sha1 checksum of the file with the one located in
debian dsc file.

However for old debian-based distributions (some ubuntu old releases for
instance) the only available checksum in the dsc file is a md5 sum.

So add a fallback to use md5 sum to check successful download when sha*
checksum is missing in the dsc file.

Related to T2400

Diff Detail

Event Timeline

Build is green

Patch application report for D6750 (id=24511)

Rebasing onto 5d22455c94...

Current branch diff-target is up to date.
Changes applied before test
commit 17e441d0100be6e2d362c83c60ba1908304fd2a1
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Dec 6 14:29:32 2021 +0100

    debian: Add md5 sum fallback when sha* checksum is missing in metadata
    
    In order to check successful download of a package file, the debian loader
    will compare sha256 or sha1 checksum of the file with the one located in
    debian dsc file.
    
    However for old debian-based distributions (some ubuntu old releases for
    instance) the only available checksum in the dsc file is a md5 sum.
    
    So add a fallback to use md5 sum to check successful download when sha*
    checksum is missing in the dsc file.
    
    Related to T2400

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/645/ for more details.

olasd added inline comments.
swh/loader/package/debian/loader.py
174–176

Oof, that looks pretty leaky.

I guess we should:

  • actually add md5 as a supported algorithm in swh.model.hashutil.MultiHash
  • turn the use of DOWNLOAD_HASHES into a class attribute of the base package loader (with a default value) rather than using a hardcoded list
swh/loader/package/debian/loader.py
174–176

Yeah, I went for a quick and dirty fix here as this is the only case where md5 sum is needed, will update accordingly then.

swh/loader/package/debian/loader.py
174–176

actually add md5 as a supported algorithm in swh.model.hashutil.MultiHash

D6755

ardumont added inline comments.
swh/loader/package/debian/loader.py
174–176

Ok, so you actually need to land that pile of diff in swh.model and then rebase that one so you can use your other diff's code, right?

Rebase and update diff after swh-model 3.1.0 release

Build is green

Patch application report for D6750 (id=24528)

Rebasing onto 89f5ccc7f5...

Current branch diff-target is up to date.
Changes applied before test
commit 2d9e93a2f246011f5e79e5a4c6e5c66284eb4bce
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Dec 6 14:29:32 2021 +0100

    debian: Add md5 sum fallback when sha* checksum is missing in metadata
    
    In order to check successful download of a package file, the debian loader
    will compare sha256 or sha1 checksum of the file with the one located in
    debian dsc file.
    
    However for old debian-based distributions (some ubuntu old releases for
    instance) the only available checksum in the dsc file is a md5 sum.
    
    So add a fallback to use md5 sum to check successful download when sha*
    checksum is missing in the dsc file.
    
    Related to T2400

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/648/ for more details.

swh/loader/package/debian/loader.py
174–176

turn the use of DOWNLOAD_HASHES into a class attribute of the base package loader (with a default value) rather than using a hardcoded list

I opted for a simpler solution by merging the default DOWNLOAD_HASHES set with the one derived from the keys of the hashes parameter of swh.loader.package.utils.download function.

174–176

I landed swh-model diffs and tagged a v3.1.0 release, build is green so it looks we are good here.

Thank you!

swh/loader/package/debian/loader.py
174–176

Ah, even better!

This revision is now accepted and ready to land.Dec 7 2021, 9:36 AM