Page MenuHomeSoftware Heritage

Fix revision extra header value type
ClosedPublic

Authored by anlambert on Sep 10 2019, 4:31 PM.

Details

Summary

Fix issue introduced in D1507, extra header values must be of bytes type and not str one.

Related D1976

Diff Detail

Repository
rDLDHG Mercurial loader
Branch
fix-extra-header-value-type
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 7751
Build 11141: tox-on-jenkinsJenkins
Build 11140: arc lint + arc unit

Event Timeline

swh/loader/mercurial/loader.py
386

Why not hash_to_bytes?

swh/loader/mercurial/tests/test_loader.py
308

if you want to go the bytehex way, you only need .decode('ascii')

swh/loader/mercurial/loader.py
386

@olasd suggested to use this function on #swh-devel

swh/loader/mercurial/tests/test_loader.py
308

Ack

Decode from ascii in updated test

I'm still not quite sure why we're translating this data specifically (and nothing else), but not to the point that I want to block this.

My feeling is that this is working around a presentation issue, and that if anything, we should be teaching the frontend how to interpret and present the raw value instead.

The whole logic of splitting the extra data on nul bytes will also yield surprising results if the transplant references a nodeid with a nul byte in it; but that's another issue altogether.

This revision is now accepted and ready to land.Sep 10 2019, 5:45 PM
This revision was landed with ongoing or failed builds.Sep 10 2019, 6:12 PM
This revision was automatically updated to reflect the committed changes.

I'm still not quite sure why we're translating this data specifically (and nothing else), but not to the point that I want to block this.

I think I was misled by the webapp conversion code for revision metadata in a sense that metadata should be JSON convertible.

My feeling is that this is working around a presentation issue, and that if anything, we should be teaching the frontend how to interpret and present the raw value instead.

As revision metadata already contain Mercurial node ids in hexadecimal format, I thought it will be better to get the same representation for the transplant_source extra
(which is the only ChangesetExtra storing a nodeid in binary format) instead of patching the conversion code of the webapp.

The whole logic of splitting the extra data on nul bytes will also yield surprising results if the transplant references a nodeid with a nul byte in it; but that's another issue altogether.

Hopefully, this mercurial feature does not seem widely used. Nevertheless, further investigations are needed to check if we can encounter that corner case.