Page MenuHomeSoftware Heritage

Add method MerkleNode.iter_tree, to visit all nodes in the subtree of a node.
ClosedPublic

Authored by vlorentz on Feb 24 2020, 4:09 PM.

Details

Summary

This will be used by the loaders, instead of collect(), because collect()
returns nested dictionaries, and its internals (deep_update) highly depend
on working with dicts.

Diff Detail

Repository
rDMOD Data model
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ardumont added a subscriber: ardumont.

Sounds about right.

Got a couple of remarks/questions.

swh/model/merkle.py
275

So as olasd mentioned in irc, Iterator['MerkleNode']?

Also, can't we import the MerkleNode and use it plainly here?

This revision is now accepted and ready to land.Feb 24 2020, 4:59 PM
olasd added a subscriber: olasd.

I've just noticed something: the top level _iter_tree will walk the tree twice: once when computing self.hash, and again in recursing over the children. Of course the recursed calls will have a cached self.hash and will only need to walk the tree once.

There must be a way to avoid that, maybe by yielding the children before yielding oneself, but I couldn't really work it out with the deduplication.

I don't think we should really care (collect() has had that issue since it's existed as well), but I thought I'd mention it if you can find a way to avoid that.

swh/model/merkle.py
275

This is the MerkleNode. You can't reference the class you're currently defining.

282

If we want to be pedantic about it, seen should probably be a tuple (self.type, self.hash), even though for all intents and purposes the probabilty of collision across types is very low.

283

I guess that line is not really needed.

swh/model/merkle.py
275

thanks, i did not notice (d'oh)!