Page MenuHomeSoftware Heritage

algos: Add iterators to walk across revisions history
ClosedPublic

Authored by anlambert on Nov 6 2018, 10:39 PM.

Details

Summary

This diff adds iterators to walk across the history of revisions heading
to a given one. The following types of iteration are offered:

  • committer_date: revisions are returned in reverse chronological order of their commit date (same as git log)
  • dfs: revisions are returned in the same order they are visited when performing a depth-first search in pre order on the revisions DAG
  • dfs_post: revisions are returned in the same order they are visited when performing a depth-first search in post order on the revisions DAG
  • bfs: revisions are returned in the same order they are visited when performing a breadth-first search on the revisions DAG

Another iterator of type path, returning only revisions that modify a
specific path in reverse chronological order of their commit date, is also
introduced. Nevertheless, due to client-side implementation, its performance
are far from optimal when walking across a really large history.

Related T1026
Related T1284

Diff Detail

Repository
rDSTO Storage manager
Branch
revisions-walker
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 2264
Build 2727: tox-on-jenkinsJenkins
Build 2726: arc lint + arc unit

Event Timeline

Started to review this.
This seems good so far (not done yet).
Mostly nicpicks on sentence phrasing (up for dicsussion as always ;).

swh/storage/algos/revisions_walker.py
76

according to the desired revisions history pathway (dfs, dfs_post, bfs, etc...)?

(or path instead of pathway)

338

A little docstring here would be nice.

357

is finished. This checks for the specified path's existence in the last returned revision's parents' source trees.

360

considered finished.

363

Whether

373

Is this enough to check only the direct parents' source trees?

Further down the line, in other ancestors (say grand-parents for example), we could have have the same path present, couldn't we?
My understanding is that they won't be detected here.
(But maybe that's voluntary, just checking)

Thanks for that first review. I will update the diff according to comments.

swh/storage/algos/revisions_walker.py
76

I should have said here: "Derived classes must implement it according to the desired method
to walk across the revisions history."

Using the way word is indeed confusing.

338

ack

357

Better indeed

373

The idea here is not to walk the entire history as for really large ones, the iteration process can take a lot of time (as I said, this implementation is not optimal and is mainly here for showcase).

I will update the class docstring in order to precise that the iteration will stop once it founds
a revision where the path was added.

Update: address @ardumont comments, rename method _get_path_sha1 to _get_path_id and fix some typos

Awesome!

swh/storage/algos/revisions_walker.py
52
state (Optional[dict]): previous revisions walker state (for pagination purposes).
162

I did not get it at first, had to check D627 to have a better understanding.

Its purpose is to continue the iteration in a pagination context (cf. state parameter in __init__ method).

Not a blocker, a general idea.
I think updating the docstring to demonstrate how to use the revision walker (simple and with pagination) would be a good idea.

We do that sometimes and it's pretty neat [1].

[1] https://forge.softwareheritage.org/source/swh-objstorage/browse/master/swh/objstorage/multiplexer/multiplexer_objstorage.py$114-149

This revision is now accepted and ready to land.Nov 9 2018, 11:17 AM
anlambert added inline comments.
swh/storage/algos/revisions_walker.py
52

good catch

162

Good idea.

I think the best option here is to refer to the factory function get_revisions_walker in the base
class constructor and add code snippets on how to use a walker in the docstring of that function.

Last update: address latest @ardumont comments and more docstrings polishing

This revision was automatically updated to reflect the committed changes.
swh/storage/algos/revisions_walker.py
502

Awesome! Thanks.