Page MenuHomeSoftware Heritage

Use a btree of (task_type, md5(arguments)) to match task arguments
ClosedPublic

Authored by olasd on Dec 13 2019, 11:33 AM.

Details

Summary

The former index on hash(arguments->'args') has lost relevance as about half the
tasks (the ones for the loader) have the same value (an empty list) for this
field.

This index is more universal, faster, and also easier to convince the planner of
using.

If we want more specific indexes (e.g. on specific keyword arguments) we'll be
able to add that separately.

Test Plan

this is deployed in production (which is the best way to have a
relevant, large dataset to check these kinds of queries...)

Diff Detail

Repository
rDSCH Scheduling utilities
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 9743
Build 14368: tox-on-jenkinsJenkins
Build 14367: arc lint + arc unit

Event Timeline

olasd created this revision.Dec 13 2019, 11:33 AM
ardumont accepted this revision.Dec 13 2019, 11:36 AM

Awesome.

Thanks.

This revision is now accepted and ready to land.Dec 13 2019, 11:36 AM
douardda accepted this revision.Dec 13 2019, 11:37 AM
douardda added a subscriber: douardda.

LGTM