Page MenuHomeSoftware Heritage

Guesstimate partition boundaries from extrema rather than using expensive offsets
ClosedPublic

Authored by olasd on Mar 18 2019, 6:42 PM.

Details

Summary

Using order by and offset makes the partitioning a n^2 operation on the number
of entries in the table, rather than an instant operation when using
min/max.

This assumes the indexable column is more or less uniform, which is not exactly
true but not the worst approximation either.

Test Plan

tox

Diff Detail

Repository
rDLS Listers
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

douardda added a subscriber: douardda.

Should we be worried that tests are green after this diff?

This revision is now accepted and ready to land.Mar 19 2019, 9:59 AM

Should we be worried that tests are green after this diff?

We only test that the partitioning works (i.e. that we can generate partitions) and, _I think_, that all partitions yield all the entries.

In any case, one of the bitbucket lister tests was red before I did the ugly ugly string -> datetime bodge.

This revision was automatically updated to reflect the committed changes.