Page MenuHomeSoftware Heritage

converters: Recompute hashes and check they match the originals
ClosedPublic

Authored by vlorentz on Sep 16 2021, 2:03 PM.

Details

Summary

This makes sure we don't write corrupt objects to the storage,
like the examples in T75.

Diff Detail

Repository
rDLDG Git loader
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 23681
Build 36966: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 36965: arc lint + arc unit

Event Timeline

Build has FAILED

Patch application report for D6281 (id=22747)

Could not rebase; Attempt merge onto 85eb54015c...

Updating 85eb540..3558c5f
Fast-forward
 requirements.txt                        |   1 +
 swh/loader/git/converters.py            |  74 ++++++++++++++++----
 swh/loader/git/tests/test_converters.py | 115 ++++++++++++++++++++++++++++----
 3 files changed, 164 insertions(+), 26 deletions(-)
Changes applied before test
commit 3558c5f5aaf1ba9f36e1544cb1719ebeb848aada
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 16 14:01:20 2021 +0200

    converters: Recompute hashes and check they match the originals
    
    This makes sure we don't write corrupt objects to the storage,
    like the examples in T75.

commit f413e171b6e0ffd703613089bdfbee24dc048bda
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 16 13:39:59 2021 +0200

    converters: Add typing

Link to build: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/106/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/106/console

Harbormaster returned this revision to the author for changes because remote builds failed.Sep 16 2021, 2:04 PM
Harbormaster failed remote builds in B23678: Diff 22747!
vlorentz edited the test plan for this revision. (Show Details)

Build has FAILED

Patch application report for D6281 (id=22749)

Could not rebase; Attempt merge onto 85eb54015c...

Updating 85eb540..4f17deb
Fast-forward
 requirements.txt                        |   1 +
 swh/loader/git/converters.py            |  74 ++++++++++++++++----
 swh/loader/git/tests/test_converters.py | 115 ++++++++++++++++++++++++++++----
 3 files changed, 164 insertions(+), 26 deletions(-)
Changes applied before test
commit 4f17deb0037c7108c3e56344331014882b60475a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 16 14:01:20 2021 +0200

    converters: Recompute hashes and check they match the originals
    
    This makes sure we don't write corrupt objects to the storage,
    like the examples in T75.

commit f413e171b6e0ffd703613089bdfbee24dc048bda
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 16 13:39:59 2021 +0200

    converters: Add typing

Link to build: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/107/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/107/console

Build has FAILED

Patch application report for D6281 (id=22749)

Could not rebase; Attempt merge onto 85eb54015c...

Updating 85eb540..4f17deb
Fast-forward
 requirements.txt                        |   1 +
 swh/loader/git/converters.py            |  74 ++++++++++++++++----
 swh/loader/git/tests/test_converters.py | 115 ++++++++++++++++++++++++++++----
 3 files changed, 164 insertions(+), 26 deletions(-)
Changes applied before test
commit 4f17deb0037c7108c3e56344331014882b60475a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 16 14:01:20 2021 +0200

    converters: Recompute hashes and check they match the originals
    
    This makes sure we don't write corrupt objects to the storage,
    like the examples in T75.

commit f413e171b6e0ffd703613089bdfbee24dc048bda
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 16 13:39:59 2021 +0200

    converters: Add typing

Link to build: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/108/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/108/console

Build is green

Patch application report for D6281 (id=22752)

Could not rebase; Attempt merge onto 85eb54015c...

Updating 85eb540..a34cbe5
Fast-forward
 requirements-swh.txt                    |   2 +-
 requirements.txt                        |   1 +
 swh/loader/git/converters.py            |  74 ++++++++++++++++----
 swh/loader/git/tests/test_converters.py | 115 ++++++++++++++++++++++++++++----
 4 files changed, 165 insertions(+), 27 deletions(-)
Changes applied before test
commit a34cbe5196eab14b98d311ae4eda3ab590c18675
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 16 14:01:20 2021 +0200

    converters: Recompute hashes and check they match the originals
    
    This makes sure we don't write corrupt objects to the storage,
    like the examples in T75.

commit f413e171b6e0ffd703613089bdfbee24dc048bda
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 16 13:39:59 2021 +0200

    converters: Add typing

See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/109/ for more details.

ardumont added a subscriber: ardumont.

lgtm, a couple of questions/remarks inline.

swh/loader/git/converters.py
44

I have a hard time reading this ^ chunk (line 37 to 45).

Does that mean something like, all functions decorated with check_ids must respect the signature (self, obj: ShaFile, log=None) -> _THashable when called upon?

(That plus what the decorator actually does, checking for mismatched checksums).

Now that i said it, it actually makes sense to me...

swh/loader/git/tests/test_converters.py
116–120

why not ^ ?

or even simpler:

def sha(self):
    return lambda: self._sha

?

This revision is now accepted and ready to land.Sep 16 2021, 5:09 PM
swh/loader/git/tests/test_converters.py
116–120

because that's not what dulwich expects.

rewrite without the very readable annotation

remove dep on typing-extensions

Build is green

Patch application report for D6281 (id=22763)

Could not rebase; Attempt merge onto 85eb54015c...

Updating 85eb540..61e2d2c
Fast-forward
 requirements-swh.txt                    |   2 +-
 requirements.txt                        |   1 +
 swh/loader/git/converters.py            |  64 +++++++++++++-----
 swh/loader/git/tests/test_converters.py | 115 ++++++++++++++++++++++++++++----
 4 files changed, 152 insertions(+), 30 deletions(-)
Changes applied before test
commit 61e2d2c1f1ce043e6bed9b9b22a4b065150835b4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 16 14:01:20 2021 +0200

    converters: Recompute hashes and check they match the originals
    
    This makes sure we don't write corrupt objects to the storage,
    like the examples in T75.

commit f413e171b6e0ffd703613089bdfbee24dc048bda
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 16 13:39:59 2021 +0200

    converters: Add typing

See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/110/ for more details.

Build is green

Patch application report for D6281 (id=22764)

Could not rebase; Attempt merge onto 85eb54015c...

Updating 85eb540..6d7a998
Fast-forward
 requirements-swh.txt                    |   2 +-
 swh/loader/git/converters.py            |  64 +++++++++++++-----
 swh/loader/git/tests/test_converters.py | 115 ++++++++++++++++++++++++++++----
 3 files changed, 151 insertions(+), 30 deletions(-)
Changes applied before test
commit 6d7a998b1093aaba83d0660188fbe48c0ed5b269
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 16 14:01:20 2021 +0200

    converters: Recompute hashes and check they match the originals
    
    This makes sure we don't write corrupt objects to the storage,
    like the examples in T75.

commit f413e171b6e0ffd703613089bdfbee24dc048bda
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Sep 16 13:39:59 2021 +0200

    converters: Add typing

See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/111/ for more details.

douardda added inline comments.
swh/loader/git/converters.py
36

why not make check_id return the object?
so you can write:

return check_id(Directory(id=tree.sha().digest(), entries=tuple(entries),))
swh/loader/git/converters.py
36

because it's weird and forces line wrapping when constructing Revision

swh/loader/git/tests/test_converters.py
116–120

ah!

This revision is now accepted and ready to land.Sep 16 2021, 6:06 PM