Page MenuHomeSoftware Heritage

in_memory: Add InMemoryCqlRunner, a class that emulates cassandra.cql.CqlRunner without Cassandra.
ClosedPublic

Authored by vlorentz on Aug 12 2020, 6:01 PM.

Details

Summary

For now it's only used for object counters; but future commits will
progressively move the in-memory's storage features to it.

Diff Detail

Event Timeline

Build has FAILED

Patch application report for D3768 (id=13253)

Could not rebase; Attempt merge onto 6675286c4f...

Updating 6675286c..385364bd
Fast-forward
 swh/storage/cassandra/cql.py        |  17 ++--
 swh/storage/cassandra/model.py      |  35 +++++++-
 swh/storage/in_memory.py            | 166 +++++++++++++++++++++++++-----------
 swh/storage/tests/test_in_memory.py |  56 +++++++++++-
 swh/storage/writer.py               |   4 +-
 5 files changed, 215 insertions(+), 63 deletions(-)
Changes applied before test
commit 385364bdb4fbc6af87c433e10fd9a68c247fb2ab
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:41:42 2020 +0200

    in_memory: Add InMemoryCqlRunner, a class that emulates cassandra.cql.CqlRunner without Cassandra.
    
    For now it's only used for object counters; but future commits will
    progressively move the in-memory's storage features to it.

commit 6a3ef9a907c76e8a99c06690af3dffb7d93318a8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:34:30 2020 +0200

    Make InMemoryStorage inherit from CassandraStorage.
    
    This has no effect for now, other than deduplicating a method
    and causing a name clash.

commit d9bfed69178e07f882bade35b50fa99a2130211c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:24:33 2020 +0200

    in_memory: Add class Table, which emulates a Cassandra table.
    
    It will be used to implement the in-memory storage as a backend for the
    cassandra storage.

commit fc80a7f06ae8e1fccbe3ce2ee7576d2e9c5ff93a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:16:44 2020 +0200

    cassandra.cql: Fix return type of stat_counters.

commit 267f48024bfd257e91aa95f6e1ae99320587add7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:16:13 2020 +0200

    cassandra.model: Add PARTITION_KEY and CLUSTERING_KEY to the model classes.
    
    They will be used by the in-mem implementation of CqlRunner.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/738/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/738/console

don't pass a generator when a list is expected

Build is green

Patch application report for D3768 (id=13268)

Could not rebase; Attempt merge onto 6675286c4f...

Updating 6675286c..0fd29e0a
Fast-forward
 swh/storage/cassandra/cql.py        |  17 ++--
 swh/storage/cassandra/model.py      |  35 +++++++-
 swh/storage/in_memory.py            | 166 +++++++++++++++++++++++++-----------
 swh/storage/replay.py               |   2 +-
 swh/storage/tests/test_in_memory.py |  56 +++++++++++-
 swh/storage/writer.py               |   4 +-
 6 files changed, 216 insertions(+), 64 deletions(-)
Changes applied before test
commit 0fd29e0a6e1a4c61513a21da067d1866e2f9d0ba
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:41:42 2020 +0200

    in_memory: Add InMemoryCqlRunner, a class that emulates cassandra.cql.CqlRunner without Cassandra.
    
    For now it's only used for object counters; but future commits will
    progressively move the in-memory's storage features to it.

commit 6a3ef9a907c76e8a99c06690af3dffb7d93318a8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:34:30 2020 +0200

    Make InMemoryStorage inherit from CassandraStorage.
    
    This has no effect for now, other than deduplicating a method
    and causing a name clash.

commit d9bfed69178e07f882bade35b50fa99a2130211c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:24:33 2020 +0200

    in_memory: Add class Table, which emulates a Cassandra table.
    
    It will be used to implement the in-memory storage as a backend for the
    cassandra storage.

commit fc80a7f06ae8e1fccbe3ce2ee7576d2e9c5ff93a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:16:44 2020 +0200

    cassandra.cql: Fix return type of stat_counters.

commit 267f48024bfd257e91aa95f6e1ae99320587add7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:16:13 2020 +0200

    cassandra.model: Add PARTITION_KEY and CLUSTERING_KEY to the model classes.
    
    They will be used by the in-mem implementation of CqlRunner.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/753/ for more details.

Build is green

Patch application report for D3768 (id=13276)

Could not rebase; Attempt merge onto 6675286c4f...

Updating 6675286c..892880c4
Fast-forward
 swh/storage/cassandra/cql.py        |  17 ++--
 swh/storage/cassandra/model.py      |  35 ++++++-
 swh/storage/cassandra/storage.py    |  45 ++++-----
 swh/storage/in_memory.py            | 176 ++++++++++++++++++++++++++----------
 swh/storage/replay.py               |   2 +-
 swh/storage/tests/test_in_memory.py |  62 ++++++++++++-
 swh/storage/tests/test_storage.py   |  56 +++++++++++-
 swh/storage/writer.py               |   4 +-
 8 files changed, 310 insertions(+), 87 deletions(-)
Changes applied before test
commit 892880c4d86ca433594d602ab5e5b25eb6b50229
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:41:42 2020 +0200

    in_memory: Add InMemoryCqlRunner, a class that emulates cassandra.cql.CqlRunner without Cassandra.
    
    For now it's only used for object counters; but future commits will
    progressively move the in-memory's storage features to it.

commit 69cb557387a8a5e490501c0f2bd9aec47762ee36
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:34:30 2020 +0200

    Make InMemoryStorage inherit from CassandraStorage.
    
    This has no effect for now, other than deduplicating a method
    and causing a name clash.

commit 30d65e6bd9e290e2eb0b1e029a93be2428fb33fa
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:24:33 2020 +0200

    in_memory: Add class Table, which emulates a Cassandra table.
    
    It will be used to implement the in-memory storage as a backend for the
    cassandra storage.

commit a0037c5d12a5694bfab749896db8546b7176b633
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:16:44 2020 +0200

    cassandra.cql: Fix return type of stat_counters.

commit f53cc6fc8606e9f97af0c3eea0ce0d7ec46d486a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:16:13 2020 +0200

    cassandra.model: Add PARTITION_KEY and CLUSTERING_KEY to the model classes.
    
    They will be used by the in-mem implementation of CqlRunner.

commit dd320a63b32b0e79c6f90a83236d607093219359
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 20:25:23 2020 +0200

    cassandra: Make origin_visit_get_latest filter using any status of a visit, instead of just the last.
    
    This fixes a mismatch in behavior with the pg and the in-mem storages

commit 1d45c5a0debc2e45ceee5effe565aab36905eb1c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 19:08:17 2020 +0200

    cassandra: Fix wrong algo reported in HashCollision, because of variable shadowing.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/761/ for more details.

anlambert added inline comments.
swh/storage/in_memory.py
886

I think it should rather be

self._cql_runner.increment_counter("origin", added)

Looks good but I think the origin counter is not correctly updated.

This revision now requires changes to proceed.Aug 14 2020, 2:59 PM
This revision is now accepted and ready to land.Aug 14 2020, 4:15 PM

Build is green

Patch application report for D3768 (id=13323)

Could not rebase; Attempt merge onto 6675286c4f...

Updating 6675286c..a96c253a
Fast-forward
 swh/storage/cassandra/cql.py        |  17 ++--
 swh/storage/cassandra/model.py      |  35 ++++++-
 swh/storage/cassandra/storage.py    |  47 +++++-----
 swh/storage/in_memory.py            | 176 ++++++++++++++++++++++++++----------
 swh/storage/replay.py               |   2 +-
 swh/storage/tests/test_in_memory.py |  62 ++++++++++++-
 swh/storage/tests/test_storage.py   |  66 +++++++++++++-
 swh/storage/writer.py               |   4 +-
 8 files changed, 318 insertions(+), 91 deletions(-)
Changes applied before test
commit a96c253ab428e043de3a9c4ccc3ff04c179c3fa2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:41:42 2020 +0200

    in_memory: Add InMemoryCqlRunner, a class that emulates cassandra.cql.CqlRunner without Cassandra.
    
    For now it's only used for object counters; but future commits will
    progressively move the in-memory's storage features to it.

commit bc47283ddc8ca5397d8e25f25c670a9c4f21c435
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:34:30 2020 +0200

    Make InMemoryStorage inherit from CassandraStorage.
    
    This has no effect for now, other than deduplicating a method
    and causing a name clash.

commit 20971864bb1638b22bef00c79c7f0cb1be8afeed
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:24:33 2020 +0200

    in_memory: Add class Table, which emulates a Cassandra table.
    
    It will be used to implement the in-memory storage as a backend for the
    cassandra storage.

commit ef0600539bb7716380791e9c49619387c30407f4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:16:44 2020 +0200

    cassandra.cql: Fix return type of stat_counters.

commit 1266b6a7fe5006746e579e1ec28b4fb600b1188e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:16:13 2020 +0200

    cassandra.model: Add PARTITION_KEY and CLUSTERING_KEY to the model classes.
    
    They will be used by the in-mem implementation of CqlRunner.

commit 3dc69aaa42e76e373c4d23c1e8d98af5a804970f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 20:25:23 2020 +0200

    cassandra: Make origin_visit_get_latest filter using any status of a visit, instead of just the last.
    
    This fixes a mismatch in behavior with the pg and the in-mem storages

commit 006eeecaba7523f407fa0e253d6c832ea246d959
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 19:08:17 2020 +0200

    cassandra: Fix wrong algo reported in HashCollision, because of variable shadowing.

commit da287313765da62d34dc0dca7c22dbd61f40504f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 14 15:15:54 2020 +0200

    cassandra: Fix content_missing_per_sha1 when its parameter has length != 1.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/795/ for more details.