Guesstimate partition boundaries from extrema rather than using expensive offsets
ClosedPublic
Actions

Authored by olasd on Mar 18 2019, 6:42 PM.

Details

Reviewers

Group Reviewers

Commits

rDLSdd148fc64d7a: Guesstimate partition boundaries from extrema rather than using expensive…
rDLSc574897e2afb: Guesstimate partition boundaries from extrema rather than using expensive…

Summary

Using order by and offset makes the partitioning a n^2 operation on the number
of entries in the table, rather than an instant operation when using
min/max.

This assumes the indexable column is more or less uniform, which is not exactly
true but not the worst approximation either.

Test Plan

tox

Diff Detail

Repository

rDLS Listers

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

Event Timeline

olasd created this revision.Mar 18 2019, 6:42 PM

Herald added a reviewer: Reviewers. · View Herald TranscriptMar 18 2019, 6:42 PM

Build is green
See https://jenkins.softwareheritage.org/job/DLS/job/tox/64/ for more details.

Harbormaster completed remote builds in B4696: Diff 3980.Mar 18 2019, 6:44 PM

Should we be worried that tests are green after this diff?

This revision is now accepted and ready to land.Mar 19 2019, 9:59 AM

In D1267#26864, @douardda wrote:

Should we be worried that tests are green after this diff?

We only test that the partitioning works (i.e. that we can generate partitions) and, _I think_, that all partitions yield all the entries.

In any case, one of the bitbucket lister tests was red before I did the ugly ugly string -> datetime bodge.

Closed by commit rDLSc574897e2afb: Guesstimate partition boundaries from extrema rather than using expensive… (authored by olasd). · Explain WhyMar 19 2019, 1:51 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents
Changeset List

Path

Size

swh/

lister/

core/

indexing_lister.py

60 lines

Diff 4013

View Options

swh/lister/core/indexing_lister.py

Guesstimate partition boundaries from extrema rather than using expensive offsetsClosedPublicActions