Add method MerkleNode.iter_tree, to visit all nodes in the subtree of a node.
ClosedPublic
Actions

Authored by vlorentz on Feb 24 2020, 4:09 PM.

Details

Reviewers

ardumont
olasd

Group Reviewers

Reviewers

Commits

rDMOD9cf7a04a3e0c: Add method MerkleNode.iter_tree, to visit all nodes in the subtree of a node.

Summary

This will be used by the loaders, instead of collect(), because collect()
returns nested dictionaries, and its internals (deep_update) highly depend
on working with dicts.

Diff Detail

Repository

rDMOD Data model

Branch

iter-tree

Lint

No Linters Available

Unit

No Unit Test Coverage

Build Status

Buildable 10769
Build 16167: tox-on-jenkins	Jenkins
Build 16166: arc lint + arc unit

Event Timeline

vlorentz created this revision.Feb 24 2020, 4:09 PM

Herald added a reviewer: Reviewers. · View Herald TranscriptFeb 24 2020, 4:09 PM

vlorentz added a parent revision: D2712: Take the value of MerkleNode.data into account to compute equality..Feb 24 2020, 4:10 PM

Build is green
See https://jenkins.softwareheritage.org/job/DMOD/job/tox/190/ for more details.

Harbormaster completed remote builds in B10743: Diff 9676.Feb 24 2020, 4:10 PM

vlorentz mentioned this in D2714: Use swh-model objects in package loader..Feb 24 2020, 4:36 PM

Sounds about right.

Got a couple of remarks/questions.

swh/model/merkle.py
275	So as olasd mentioned in irc, Iterator['MerkleNode']? Also, can't we import the MerkleNode and use it plainly here?

This revision is now accepted and ready to land.Feb 24 2020, 4:59 PM

I've just noticed something: the top level _iter_tree will walk the tree twice: once when computing self.hash, and again in recursing over the children. Of course the recursed calls will have a cached self.hash and will only need to walk the tree once.

There must be a way to avoid that, maybe by yielding the children before yielding oneself, but I couldn't really work it out with the deduplication.

I don't think we should really care (collect() has had that issue since it's existed as well), but I thought I'd mention it if you can find a way to avoid that.

swh/model/merkle.py
275	This is the `MerkleNode`. You can't reference the class you're currently defining.
282	If we want to be pedantic about it, `seen` should probably be a tuple `(self.type, self.hash)`, even though for all intents and purposes the probabilty of collision across types is very low.
283	I guess that line is not really needed.