This makes sure we don't write corrupt objects to the storage,
like the examples in T75.
Details
- Reviewers
ardumont - Group Reviewers
Reviewers - Maniphest Tasks
- T399: (Re-)Compute data checksums before insertion
T3135: Improve integrity of ingested content - Commits
- rDLDG6d7a998b1093: converters: Recompute hashes and check they match the originals
Diff Detail
- Repository
- rDLDG Git loader
- Branch
- master
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 23684 Build 36971: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 36970: arc lint + arc unit
Event Timeline
Build has FAILED
Patch application report for D6281 (id=22747)
Could not rebase; Attempt merge onto 85eb54015c...
Updating 85eb540..3558c5f Fast-forward requirements.txt | 1 + swh/loader/git/converters.py | 74 ++++++++++++++++---- swh/loader/git/tests/test_converters.py | 115 ++++++++++++++++++++++++++++---- 3 files changed, 164 insertions(+), 26 deletions(-)
Changes applied before test
commit 3558c5f5aaf1ba9f36e1544cb1719ebeb848aada Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Sep 16 14:01:20 2021 +0200 converters: Recompute hashes and check they match the originals This makes sure we don't write corrupt objects to the storage, like the examples in T75. commit f413e171b6e0ffd703613089bdfbee24dc048bda Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Sep 16 13:39:59 2021 +0200 converters: Add typing
Link to build: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/106/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/106/console
Build has FAILED
Patch application report for D6281 (id=22749)
Could not rebase; Attempt merge onto 85eb54015c...
Updating 85eb540..4f17deb Fast-forward requirements.txt | 1 + swh/loader/git/converters.py | 74 ++++++++++++++++---- swh/loader/git/tests/test_converters.py | 115 ++++++++++++++++++++++++++++---- 3 files changed, 164 insertions(+), 26 deletions(-)
Changes applied before test
commit 4f17deb0037c7108c3e56344331014882b60475a Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Sep 16 14:01:20 2021 +0200 converters: Recompute hashes and check they match the originals This makes sure we don't write corrupt objects to the storage, like the examples in T75. commit f413e171b6e0ffd703613089bdfbee24dc048bda Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Sep 16 13:39:59 2021 +0200 converters: Add typing
Link to build: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/107/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/107/console
Build has FAILED
Patch application report for D6281 (id=22749)
Could not rebase; Attempt merge onto 85eb54015c...
Updating 85eb540..4f17deb Fast-forward requirements.txt | 1 + swh/loader/git/converters.py | 74 ++++++++++++++++---- swh/loader/git/tests/test_converters.py | 115 ++++++++++++++++++++++++++++---- 3 files changed, 164 insertions(+), 26 deletions(-)
Changes applied before test
commit 4f17deb0037c7108c3e56344331014882b60475a Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Sep 16 14:01:20 2021 +0200 converters: Recompute hashes and check they match the originals This makes sure we don't write corrupt objects to the storage, like the examples in T75. commit f413e171b6e0ffd703613089bdfbee24dc048bda Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Sep 16 13:39:59 2021 +0200 converters: Add typing
Link to build: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/108/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/108/console
Build is green
Patch application report for D6281 (id=22752)
Could not rebase; Attempt merge onto 85eb54015c...
Updating 85eb540..a34cbe5 Fast-forward requirements-swh.txt | 2 +- requirements.txt | 1 + swh/loader/git/converters.py | 74 ++++++++++++++++---- swh/loader/git/tests/test_converters.py | 115 ++++++++++++++++++++++++++++---- 4 files changed, 165 insertions(+), 27 deletions(-)
Changes applied before test
commit a34cbe5196eab14b98d311ae4eda3ab590c18675 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Sep 16 14:01:20 2021 +0200 converters: Recompute hashes and check they match the originals This makes sure we don't write corrupt objects to the storage, like the examples in T75. commit f413e171b6e0ffd703613089bdfbee24dc048bda Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Sep 16 13:39:59 2021 +0200 converters: Add typing
See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/109/ for more details.
lgtm, a couple of questions/remarks inline.
swh/loader/git/converters.py | ||
---|---|---|
45 | I have a hard time reading this ^ chunk (line 37 to 45). Does that mean something like, all functions decorated with check_ids must respect the signature (self, obj: ShaFile, log=None) -> _THashable when called upon? (That plus what the decorator actually does, checking for mismatched checksums). Now that i said it, it actually makes sense to me... | |
swh/loader/git/tests/test_converters.py | ||
116–120 | why not ^ ? or even simpler: def sha(self): return lambda: self._sha ? |
swh/loader/git/tests/test_converters.py | ||
---|---|---|
116–120 | because that's not what dulwich expects. |
Build is green
Patch application report for D6281 (id=22763)
Could not rebase; Attempt merge onto 85eb54015c...
Updating 85eb540..61e2d2c Fast-forward requirements-swh.txt | 2 +- requirements.txt | 1 + swh/loader/git/converters.py | 64 +++++++++++++----- swh/loader/git/tests/test_converters.py | 115 ++++++++++++++++++++++++++++---- 4 files changed, 152 insertions(+), 30 deletions(-)
Changes applied before test
commit 61e2d2c1f1ce043e6bed9b9b22a4b065150835b4 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Sep 16 14:01:20 2021 +0200 converters: Recompute hashes and check they match the originals This makes sure we don't write corrupt objects to the storage, like the examples in T75. commit f413e171b6e0ffd703613089bdfbee24dc048bda Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Sep 16 13:39:59 2021 +0200 converters: Add typing
See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/110/ for more details.
Build is green
Patch application report for D6281 (id=22764)
Could not rebase; Attempt merge onto 85eb54015c...
Updating 85eb540..6d7a998 Fast-forward requirements-swh.txt | 2 +- swh/loader/git/converters.py | 64 +++++++++++++----- swh/loader/git/tests/test_converters.py | 115 ++++++++++++++++++++++++++++---- 3 files changed, 151 insertions(+), 30 deletions(-)
Changes applied before test
commit 6d7a998b1093aaba83d0660188fbe48c0ed5b269 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Sep 16 14:01:20 2021 +0200 converters: Recompute hashes and check they match the originals This makes sure we don't write corrupt objects to the storage, like the examples in T75. commit f413e171b6e0ffd703613089bdfbee24dc048bda Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Sep 16 13:39:59 2021 +0200 converters: Add typing
See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/111/ for more details.
swh/loader/git/converters.py | ||
---|---|---|
36 | why not make check_id return the object? return check_id(Directory(id=tree.sha().digest(), entries=tuple(entries),)) |
swh/loader/git/converters.py | ||
---|---|---|
36 | because it's weird and forces line wrapping when constructing Revision |
swh/loader/git/tests/test_converters.py | ||
---|---|---|
116–120 | ah! |