Page MenuHomeSoftware Heritage

Make raw_extrinsic_metadata_get return PagedResult instead of Dict.
ClosedPublic

Authored by vlorentz on Jul 31 2020, 12:41 PM.

Diff Detail

Repository
rDSTO Storage manager
Branch
raw_extrinsic_metadata_get-PagedResult
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 14135
Build 21715: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 21714: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D3663 (id=12895)

Could not rebase; Attempt merge onto cf9f44e805...

Updating cf9f44e8..b73e024d
Fast-forward
 swh/storage/backfill.py                |  43 +++++++++++++-
 swh/storage/cassandra/storage.py       |  24 ++++----
 swh/storage/converters.py              |  38 ++++++++++++
 swh/storage/db.py                      |  42 +++++++------
 swh/storage/in_memory.py               |  24 ++++----
 swh/storage/interface.py               |   2 +-
 swh/storage/replay.py                  |   8 ++-
 swh/storage/storage.py                 |  58 +++++-------------
 swh/storage/tests/test_backfill.py     |   5 +-
 swh/storage/tests/test_kafka_writer.py |   9 +++
 swh/storage/tests/test_replay.py       |   9 ++-
 swh/storage/tests/test_retry.py        |   4 +-
 swh/storage/tests/test_storage.py      | 104 ++++++++++++++++-----------------
 13 files changed, 222 insertions(+), 148 deletions(-)
Changes applied before test
commit b73e024d5d500a676ddcba63f17e251c427ba6db
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jul 31 12:30:31 2020 +0200

    Make raw_extrinsic_metadata_get return PagedResult instead of Dict.

commit df943ec25cf91c0417c23a7376d40414a429db7d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jul 31 10:11:57 2020 +0200

    pg: Rewrite _origin_query to force the query planner to filter on URLs before filtering on visits.
    
    URL filters usually have a few matches and use the index; whereas filtering
    on visits requires to scan the entire origin table first.
    
    This makes the query considerably faster.
    
    Credit for the idea goes to @olasd.

commit 0c5a8e274af2aae42cdaedf3b462a9db0fdbf177
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jul 30 19:39:41 2020 +0200

    Add support for metadata-related object types to the backfiller and replayer.
    
    Existing tests automatically test them, using data from swh.journal.tests.

commit 24bc51dfff6c2fd825534a6dac23ff6a7e02faa0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jul 30 19:33:14 2020 +0200

    test_replay: update for swh.journal 0.4.1.
    
    DUPLICATE_CONTENTS now contains BaseModel objects.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/633/ for more details.

Build is green

Patch application report for D3663 (id=12897)

Could not rebase; Attempt merge onto cf9f44e805...

Updating cf9f44e8..b385c799
Fast-forward
 swh/storage/backfill.py                |  43 +++++++++++++-
 swh/storage/cassandra/storage.py       |  24 ++++----
 swh/storage/converters.py              |  38 ++++++++++++
 swh/storage/db.py                      |  42 +++++++------
 swh/storage/in_memory.py               |  24 ++++----
 swh/storage/interface.py               |   7 +--
 swh/storage/replay.py                  |   8 ++-
 swh/storage/storage.py                 |  58 +++++-------------
 swh/storage/tests/test_backfill.py     |   5 +-
 swh/storage/tests/test_kafka_writer.py |   9 +++
 swh/storage/tests/test_replay.py       |   9 ++-
 swh/storage/tests/test_retry.py        |   4 +-
 swh/storage/tests/test_storage.py      | 104 ++++++++++++++++-----------------
 13 files changed, 223 insertions(+), 152 deletions(-)
Changes applied before test
commit b385c7994ece2cdb52b813e93b79dd385c2bee2a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jul 31 12:30:31 2020 +0200

    Make raw_extrinsic_metadata_get return PagedResult instead of Dict.

commit df943ec25cf91c0417c23a7376d40414a429db7d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jul 31 10:11:57 2020 +0200

    pg: Rewrite _origin_query to force the query planner to filter on URLs before filtering on visits.
    
    URL filters usually have a few matches and use the index; whereas filtering
    on visits requires to scan the entire origin table first.
    
    This makes the query considerably faster.
    
    Credit for the idea goes to @olasd.

commit 0c5a8e274af2aae42cdaedf3b462a9db0fdbf177
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jul 30 19:39:41 2020 +0200

    Add support for metadata-related object types to the backfiller and replayer.
    
    Existing tests automatically test them, using data from swh.journal.tests.

commit 24bc51dfff6c2fd825534a6dac23ff6a7e02faa0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Jul 30 19:33:14 2020 +0200

    test_replay: update for swh.journal 0.4.1.
    
    DUPLICATE_CONTENTS now contains BaseModel objects.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/637/ for more details.

This revision is now accepted and ready to land.Jul 31 2020, 1:38 PM