Page MenuHomeSoftware Heritage

revision_log: add toposort order
AbandonedPublic

Authored by seirl on Jan 18 2018, 5:02 PM.

Details

Reviewers
olasd
Group Reviewers
Reviewers

Diff Detail

Repository
rDSTO Storage manager
Branch
toposort
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 1163
Build 1507: arc lint + arc unit

Event Timeline

olasd requested changes to this revision.Jan 18 2018, 6:36 PM
olasd added a subscriber: olasd.

I think the interplay between the sort option and the limiting is currently wrong: as the limit is done on the database side, limiting to a number of results will return an arbitrary subset of the history, and try to topologically sort that, which doesn't look too good.

I'm not entirely sure how to fix that, but one way would be to generate the full, topologically sorted history using revision_shortlog, then doing the limiting, and finally pulling the full revision information using revision_get.

Also, you should add some tests. I suggest generating a history of the following shape :

A root commit
|\
B |
| |
C |
| |
D |
|/
E (merge D, A)
|\
F |
| |
G |
|/
H (merge G, E)

The default log will show something along the lines of : H, G, E, F, D, A, C, B, while a topologically sorted log would show H, G, F, E, D, C, B, A. This should be a good case to test the interplay between limits and sort order.

This revision now requires changes to proceed.Jan 18 2018, 6:36 PM
19:17:55 <+seirl> olasd: oh, good catch. at this point I wonder if we shouldn't just let the user calling toposort() themself
19:18:24 <+seirl> and do the limiting manually afterwards
19:18:34 <+olasd> I think there's value in limiting the data you pull from the database
19:19:24 <+olasd> I honestly don't know where to draw the line
[...]
19:21:48 <+seirl> okay actually limiting the amount of things you want from a topological order doesn't seem to make sense to me, as the order isn't total
19:22:37 <+seirl> i don't see any use case where you might want a specific amount of data from the graph sorted in any topological order
19:22:52 <+olasd> huh
19:29:45 <+seirl> okay, it's like saying "i want a random sample of n revisions with the additional constraint that they come in order of dependency", which can maybe be useful, but I don't know if it justifies being added to revision_log
19:43:19 <+olasd> I think it's sensible to want a topologically sorted revision shortlog
19:44:45 <+olasd> I think it's sensible to want to paginate this somehow
19:45:09 <+olasd> but I agree that maybe we don't want that to be built into swh.storage