URL filters usually have a few matches and use the index; whereas filtering
on visits requires to scan the entire origin table first.
This makes the query considerably faster.
Credit for the idea goes to @olasd.
Differential D3662
pg: Rewrite _origin_query to force the query planner to filter on URLs before filtering on visits. vlorentz on Jul 31 2020, 10:12 AM. Authored by
Details
URL filters usually have a few matches and use the index; whereas filtering This makes the query considerably faster. Credit for the idea goes to @olasd.
Diff Detail
Event TimelineComment Actions Build has FAILED Patch application report for D3662 (id=12888)Could not rebase; Attempt merge onto cf9f44e805... Updating cf9f44e8..8c8b0b83 Fast-forward swh/storage/backfill.py | 43 +++++++++++++++++++++++++++++++++- swh/storage/converters.py | 38 ++++++++++++++++++++++++++++++ swh/storage/db.py | 42 +++++++++++++++++++-------------- swh/storage/replay.py | 8 ++++++- swh/storage/storage.py | 31 ++---------------------- swh/storage/tests/test_backfill.py | 5 +++- swh/storage/tests/test_kafka_writer.py | 9 +++++++ swh/storage/tests/test_replay.py | 9 ++++--- 8 files changed, 131 insertions(+), 54 deletions(-) Changes applied before testcommit 8c8b0b83d7831c77143f62dfa83c142c8d112b1f Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Jul 31 10:11:57 2020 +0200 pg: Rewrite _origin_query to force the query planner to filter on URLs before filtering on visits. URL filters usually have a few matches and use the index; whereas filtering on visits requires to scan the entire origin table first. This makes the query considerably faster. commit 0c5a8e274af2aae42cdaedf3b462a9db0fdbf177 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jul 30 19:39:41 2020 +0200 Add support for metadata-related object types to the backfiller and replayer. Existing tests automatically test them, using data from swh.journal.tests. commit 24bc51dfff6c2fd825534a6dac23ff6a7e02faa0 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jul 30 19:33:14 2020 +0200 test_replay: update for swh.journal 0.4.1. DUPLICATE_CONTENTS now contains BaseModel objects. Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/631/ Comment Actions Build has FAILED Patch application report for D3662 (id=12889)Could not rebase; Attempt merge onto cf9f44e805... Updating cf9f44e8..df943ec2 Fast-forward swh/storage/backfill.py | 43 +++++++++++++++++++++++++++++++++- swh/storage/converters.py | 38 ++++++++++++++++++++++++++++++ swh/storage/db.py | 42 +++++++++++++++++++-------------- swh/storage/replay.py | 8 ++++++- swh/storage/storage.py | 31 ++---------------------- swh/storage/tests/test_backfill.py | 5 +++- swh/storage/tests/test_kafka_writer.py | 9 +++++++ swh/storage/tests/test_replay.py | 9 ++++--- 8 files changed, 131 insertions(+), 54 deletions(-) Changes applied before testcommit df943ec25cf91c0417c23a7376d40414a429db7d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Jul 31 10:11:57 2020 +0200 pg: Rewrite _origin_query to force the query planner to filter on URLs before filtering on visits. URL filters usually have a few matches and use the index; whereas filtering on visits requires to scan the entire origin table first. This makes the query considerably faster. Credit for the idea goes to @olasd. commit 0c5a8e274af2aae42cdaedf3b462a9db0fdbf177 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jul 30 19:39:41 2020 +0200 Add support for metadata-related object types to the backfiller and replayer. Existing tests automatically test them, using data from swh.journal.tests. commit 24bc51dfff6c2fd825534a6dac23ff6a7e02faa0 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jul 30 19:33:14 2020 +0200 test_replay: update for swh.journal 0.4.1. DUPLICATE_CONTENTS now contains BaseModel objects. Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/632/ Comment Actions Build is green Patch application report for D3662 (id=12889)Could not rebase; Attempt merge onto cf9f44e805... Updating cf9f44e8..df943ec2 Fast-forward swh/storage/backfill.py | 43 +++++++++++++++++++++++++++++++++- swh/storage/converters.py | 38 ++++++++++++++++++++++++++++++ swh/storage/db.py | 42 +++++++++++++++++++-------------- swh/storage/replay.py | 8 ++++++- swh/storage/storage.py | 31 ++---------------------- swh/storage/tests/test_backfill.py | 5 +++- swh/storage/tests/test_kafka_writer.py | 9 +++++++ swh/storage/tests/test_replay.py | 9 ++++--- 8 files changed, 131 insertions(+), 54 deletions(-) Changes applied before testcommit df943ec25cf91c0417c23a7376d40414a429db7d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Jul 31 10:11:57 2020 +0200 pg: Rewrite _origin_query to force the query planner to filter on URLs before filtering on visits. URL filters usually have a few matches and use the index; whereas filtering on visits requires to scan the entire origin table first. This makes the query considerably faster. Credit for the idea goes to @olasd. commit 0c5a8e274af2aae42cdaedf3b462a9db0fdbf177 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jul 30 19:39:41 2020 +0200 Add support for metadata-related object types to the backfiller and replayer. Existing tests automatically test them, using data from swh.journal.tests. commit 24bc51dfff6c2fd825534a6dac23ff6a7e02faa0 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Jul 30 19:33:14 2020 +0200 test_replay: update for swh.journal 0.4.1. DUPLICATE_CONTENTS now contains BaseModel objects. See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/635/ for more details. |