Page MenuHomeSoftware Heritage

Use a btree of (task_type, md5(arguments)) to match task arguments
ClosedPublic

Authored by olasd on Dec 13 2019, 11:33 AM.

Details

Summary

The former index on hash(arguments->'args') has lost relevance as about half the
tasks (the ones for the loader) have the same value (an empty list) for this
field.

This index is more universal, faster, and also easier to convince the planner of
using.

If we want more specific indexes (e.g. on specific keyword arguments) we'll be
able to add that separately.

Test Plan

this is deployed in production (which is the best way to have a
relevant, large dataset to check these kinds of queries...)

Diff Detail

Repository
rDSCH Scheduling utilities
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.