Page MenuHomeSoftware Heritage

algos: Add iterators to walk across revisions history
ClosedPublic

Authored by anlambert on Nov 6 2018, 10:39 PM.

Details

Summary

This diff adds iterators to walk across the history of revisions heading
to a given one. The following types of iteration are offered:

  • committer_date: revisions are returned in reverse chronological order of their commit date (same as git log)
  • dfs: revisions are returned in the same order they are visited when performing a depth-first search in pre order on the revisions DAG
  • dfs_post: revisions are returned in the same order they are visited when performing a depth-first search in post order on the revisions DAG
  • bfs: revisions are returned in the same order they are visited when performing a breadth-first search on the revisions DAG

Another iterator of type path, returning only revisions that modify a
specific path in reverse chronological order of their commit date, is also
introduced. Nevertheless, due to client-side implementation, its performance
are far from optimal when walking across a really large history.

Related T1026
Related T1284

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

anlambert created this revision.Nov 6 2018, 10:39 PM

Started to review this.
This seems good so far (not done yet).
Mostly nicpicks on sentence phrasing (up for dicsussion as always ;).

swh/storage/algos/revisions_walker.py
77

according to the desired revisions history pathway (dfs, dfs_post, bfs, etc...)?

(or path instead of pathway)

339

A little docstring here would be nice.

358

is finished. This checks for the specified path's existence in the last returned revision's parents' source trees.

361

considered finished.

364

Whether

374

Is this enough to check only the direct parents' source trees?

Further down the line, in other ancestors (say grand-parents for example), we could have have the same path present, couldn't we?
My understanding is that they won't be detected here.
(But maybe that's voluntary, just checking)

anlambert marked 4 inline comments as done.Nov 8 2018, 1:23 PM

Thanks for that first review. I will update the diff according to comments.

swh/storage/algos/revisions_walker.py
77

I should have said here: "Derived classes must implement it according to the desired method
to walk across the revisions history."

Using the way word is indeed confusing.

339

ack

358

Better indeed

374

The idea here is not to walk the entire history as for really large ones, the iteration process can take a lot of time (as I said, this implementation is not optimal and is mainly here for showcase).

I will update the class docstring in order to precise that the iteration will stop once it founds
a revision where the path was added.

anlambert updated this revision to Diff 1970.Nov 8 2018, 2:06 PM

Update: address @ardumont comments, rename method _get_path_sha1 to _get_path_id and fix some typos

ardumont accepted this revision.Nov 9 2018, 11:17 AM

Awesome!

swh/storage/algos/revisions_walker.py
53
state (Optional[dict]): previous revisions walker state (for pagination purposes).
163

I did not get it at first, had to check D627 to have a better understanding.

Its purpose is to continue the iteration in a pagination context (cf. state parameter in __init__ method).

Not a blocker, a general idea.
I think updating the docstring to demonstrate how to use the revision walker (simple and with pagination) would be a good idea.

We do that sometimes and it's pretty neat [1].

[1] https://forge.softwareheritage.org/source/swh-objstorage/browse/master/swh/objstorage/multiplexer/multiplexer_objstorage.py$114-149

This revision is now accepted and ready to land.Nov 9 2018, 11:17 AM
anlambert marked 2 inline comments as done.Nov 9 2018, 11:23 AM
anlambert added inline comments.
swh/storage/algos/revisions_walker.py
53

good catch

163

Good idea.

I think the best option here is to refer to the factory function get_revisions_walker in the base
class constructor and add code snippets on how to use a walker in the docstring of that function.

anlambert updated this revision to Diff 2000.Nov 9 2018, 2:09 PM

Last update: address latest @ardumont comments and more docstrings polishing

This revision was automatically updated to reflect the committed changes.
ardumont added inline comments.Nov 9 2018, 2:19 PM
swh/storage/algos/revisions_walker.py
502

Awesome! Thanks.