Page MenuHomeSoftware Heritage

identifiers: Expose git_object instead of manifest
Needs ReviewPublic

Authored by vlorentz on Fri, Apr 30, 12:25 PM.

Details

Reviewers
None
Group Reviewers
Reviewers
Summary

The git_object is what will be actually useful to the vault.

It's also easier to test, because test_identifier.py has the
entire git_object in its test data.

However, it makes the code messier, and it's quite a big change,
so I'm not sure it's worth it.
Instead, I could slightly rewrite the tests, and make the vault
re-add the header itself (but it would duplicate this code...)

Depends on D5650

Test Plan

Tests fail because I removed hash_git_data. I'll adapt them if you think we should merge this diff.

Diff Detail

Unit TestsFailed

TimeTest
1 msJenkins > .tox.py3-full.lib.python3.7.site-packages.swh.model.tests.test_hashutil.HashlibGit::test_hashdata_content
self = <swh.model.tests.test_hashutil.HashlibGit testMethod=test_hashdata_content> def test_hashdata_content(self):
1 msJenkins > .tox.py3-full.lib.python3.7.site-packages.swh.model.tests.test_hashutil.HashlibGit::test_hashdata_revision
self = <swh.model.tests.test_hashutil.HashlibGit testMethod=test_hashdata_revision> def test_hashdata_revision(self):
1 msJenkins > .tox.py3-full.lib.python3.7.site-packages.swh.model.tests.test_hashutil.HashlibGit::test_hashdata_tag
self = <swh.model.tests.test_hashutil.HashlibGit testMethod=test_hashdata_tag> def test_hashdata_tag(self):
1 msJenkins > .tox.py3-full.lib.python3.7.site-packages.swh.model.tests.test_hashutil.HashlibGit::test_hashdata_tree
self = <swh.model.tests.test_hashutil.HashlibGit testMethod=test_hashdata_tree> def test_hashdata_tree(self):
1 msJenkins > .tox.py3-full.lib.python3.7.site-packages.swh.model.tests.test_hashutil.HashlibGit::test_unknown_header_type
self = <swh.model.tests.test_hashutil.HashlibGit testMethod=test_unknown_header_type> def test_unknown_header_type(self):
View Full Test Results (7 Failed · 463 Passed · 2 Skipped)

Event Timeline

Build has FAILED

Patch application report for D5652 (id=20186)

Could not rebase; Attempt merge onto f7e9d5c290...

Updating f7e9d5c..eaa5daa
Fast-forward
 swh/model/hashutil.py               |  59 +++----------------
 swh/model/identifiers.py            | 113 +++++++++++++++++++++++-------------
 swh/model/tests/test_identifiers.py |  56 +++++++++++++-----
 3 files changed, 124 insertions(+), 104 deletions(-)
Changes applied before test
commit eaa5daa3d5f225f6a4e9894f8b73798ea3e75a48
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Apr 30 12:24:00 2021 +0200

    identifiers: Expose git_object instead of manifest
    
    The git_object is what will be actually useful to the vault.
    
    It's also easier to test, because test_identifier.py has the
    entire git_object in its test data.

commit 3a4f484feaccd569a0b9dc37531d7af50bc3871e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Apr 30 11:26:32 2021 +0200

    identifiers: Expose manifest computation
    
    Before this commit, manifests were only computed internally before
    hashing, so they were not available to outside modules.
    
    This makes testing the module very painful, because identifier
    functions can only be tested by checking the hash; so test failures
    did not show mismatches between the computed manifest and
    the expected one.
    
    Additionally, the 'git bare cooker' of the vault is likely to use
    these as well, as it needs to format git objects in the same format.

Link to build: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/326/
See console output for more information: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/326/console

Harbormaster returned this revision to the author for changes because remote builds failed.Fri, Apr 30, 12:26 PM
Harbormaster failed remote builds in B21176: Diff 20186!

Build has FAILED

Patch application report for D5652 (id=20191)

Could not rebase; Attempt merge onto f7e9d5c290...

Updating f7e9d5c..8062ed9
Fast-forward
 swh/model/hashutil.py               |  59 +++----------------
 swh/model/identifiers.py            | 113 +++++++++++++++++++++++-------------
 swh/model/tests/test_identifiers.py |  56 +++++++++++++-----
 3 files changed, 124 insertions(+), 104 deletions(-)
Changes applied before test
commit 8062ed9a4bc314ee019765033bf985e30be579c2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Apr 30 12:24:00 2021 +0200

    identifiers: Expose git_object instead of manifest
    
    The git_object is what will be actually useful to the vault.
    
    It's also easier to test, because test_identifier.py has the
    entire git_object in its test data.

commit 3a4f484feaccd569a0b9dc37531d7af50bc3871e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Apr 30 11:26:32 2021 +0200

    identifiers: Expose manifest computation
    
    Before this commit, manifests were only computed internally before
    hashing, so they were not available to outside modules.
    
    This makes testing the module very painful, because identifier
    functions can only be tested by checking the hash; so test failures
    did not show mismatches between the computed manifest and
    the expected one.
    
    Additionally, the 'git bare cooker' of the vault is likely to use
    these as well, as it needs to format git objects in the same format.

Link to build: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/329/
See console output for more information: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/329/console

Harbormaster returned this revision to the author for changes because remote builds failed.Fri, Apr 30, 1:27 PM
Harbormaster failed remote builds in B21180: Diff 20191!

Build has FAILED

Patch application report for D5652 (id=20192)

Could not rebase; Attempt merge onto f7e9d5c290...

Updating f7e9d5c..64829e0
Fast-forward
 swh/model/hashutil.py               |  71 ++++++++--------------
 swh/model/identifiers.py            | 113 +++++++++++++++++++++++-------------
 swh/model/tests/test_identifiers.py |  56 +++++++++++++-----
 3 files changed, 138 insertions(+), 102 deletions(-)
Changes applied before test
commit 64829e0d9ea398bd2bf289bf1c57ea1fddd7a6af
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Apr 30 12:24:00 2021 +0200

    identifiers: Expose git_object instead of manifest
    
    The git_object is what will be actually useful to the vault.
    
    It's also easier to test, because test_identifier.py has the
    entire git_object in its test data.

commit 3a4f484feaccd569a0b9dc37531d7af50bc3871e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Apr 30 11:26:32 2021 +0200

    identifiers: Expose manifest computation
    
    Before this commit, manifests were only computed internally before
    hashing, so they were not available to outside modules.
    
    This makes testing the module very painful, because identifier
    functions can only be tested by checking the hash; so test failures
    did not show mismatches between the computed manifest and
    the expected one.
    
    Additionally, the 'git bare cooker' of the vault is likely to use
    these as well, as it needs to format git objects in the same format.

Link to build: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/330/
See console output for more information: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/330/console

Harbormaster returned this revision to the author for changes because remote builds failed.Fri, Apr 30, 1:29 PM
Harbormaster failed remote builds in B21181: Diff 20192!
vlorentz edited the test plan for this revision. (Show Details)

Build has FAILED

Patch application report for D5652 (id=20204)

Could not rebase; Attempt merge onto df036ef1c3...

Updating df036ef..60b138a
Fast-forward
 swh/model/hashutil.py               |  71 ++++++++--------------
 swh/model/identifiers.py            | 113 +++++++++++++++++++++++-------------
 swh/model/tests/test_identifiers.py |  56 +++++++++++++-----
 3 files changed, 138 insertions(+), 102 deletions(-)
Changes applied before test
commit 60b138a0e843ce22a2efa7ee6ddef0ec12ea4a4f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Apr 30 12:24:00 2021 +0200

    identifiers: Expose git_object instead of manifest
    
    The git_object is what will be actually useful to the vault.
    
    It's also easier to test, because test_identifier.py has the
    entire git_object in its test data.

commit 8221147ce6e74b3f685baead1906675b3454065c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Apr 30 11:26:32 2021 +0200

    identifiers: Expose manifest computation
    
    Before this commit, manifests were only computed internally before
    hashing, so they were not available to outside modules.
    
    This makes testing the module very painful, because identifier
    functions can only be tested by checking the hash; so test failures
    did not show mismatches between the computed manifest and
    the expected one.
    
    Additionally, the 'git bare cooker' of the vault is likely to use
    these as well, as it needs to format git objects in the same format.

Link to build: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/332/
See console output for more information: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/332/console