And open swh.model.hashutil.hash_stream endpoint
Related T421
Differential D410
model.hashutil: Open new endpoint to allow to hash stream ardumont on Sep 14 2018, 1:43 AM. Authored by Tags None Subscribers None
Details
And open swh.model.hashutil.hash_stream endpoint Related T421 make test
Diff Detail
Event TimelineComment Actions NACK. This is not generic at all as the iter_content() method is entirely specific to requests. To be most generic, I think providing a replica of the built-in Python hashlib API but supporting multiple hashes would be preferable: class MultiHash: def __init__(self, hash_names, length=None): self.state = {} self.track_length = False for name in hash_names: if name == 'length': self.state['length'] = 0 self.track_length = True else: self.state[name] = _new_hash(name, length) @classmethod def from_state(cls, state, track_length): ret = cls([]) ret.state = state ret.track_length = track_length def update(self, chunk): for name, h in self.state.items(): if name == 'length': continue h.update(chunk) if self.track_length: self.state['length'] += len(chunk) def digest(self): return { name: h.digest() if name != 'length' else h for name, h in self.state.items() } def hexdigest(self): return { name: h.hexdigest() if name != 'length' else h for name, h in self.state.items() } def copy(self): copied_state = { name: h.copy() if name != 'length' else h for name, h in self.state.items() } return self.from_state(copied_state, self.track_length) (untested code, I've not even run it) and going from there. I'm not convinced that the added hexdigest argument is really necessary if we provide this interface. Comment Actions
Ah yeah. I did not check that iter_content was specific.
I did not want to rewrite everything though. I hesitated to transform hash_file into something more generic without changing much of the existing code. Extracting a read as a parameter function (defaulting to read from a file). It's not as beautiful as suggested but it would touch less stuff (even though it's tested :).
If we go that road, indeed. Note: Comment Actions
Well, i don't copy/paste like a furious, don't worry. For the sha1_git computation though, we need the length at first. Comment Actions I understand.
Sure. However, the test coverage of this module is already pretty good, and rewriting hash_file in terms of the class would use most of it (at the exception of copy and of one of the two digest functions).
I'm not too excited by the combinatorial explosion of adding a new argument to all functions in this module (which we should do if we want to keep it consistent). If we do want to add this argument, we should keep the naming consistent and provide the three output formats (hash_to_*) we support:
Thinking forward, I could see a second, later refactoring step that would reimplement the hash_* functions as classmethods of the new MultiHash class, so you'd be able to call MultiHash.from_file(file_object).digest() MultiHash.from_path(b'foo', track_length=True).hexdigest() MultiHash.from_data(b'foo', track_length=True).bytehexdigest() # new method outputting the bytehex format dulwich needs if you want a non-default hash format. To be honest, my secret goal is to work towards replacing the callback pattern as it makes us write code that's a twisty and (IMO) unpythonic, e.g. when we want to collect the chunks to use them later.
Fair enough, sorry for the stronger than needed language. Comment Actions
Comment Actions Amend commit to remove the unnecessary associated docstring as well
Comment Actions 2nd step implementation (without migrating existing modules use)
Comment Actions
It's done. The historic functions hash_{data,file,path} are still present and marked as deprecated. Hopefully, this will discourage their use for the MultiHash one. Remains to migrate our base code to those but that can be delayed and is independent from this diff. Cheers,
Comment Actions Thanks for the new version! You've kept some docstrings for new arguments to the hash_* functions that are not implemented anymore (hash_format), and you've added a track_length argument to hash_path that's not being used. Once that's cleaned up those changes should be ready to land.
Comment Actions
|