Page MenuHomeSoftware Heritage

in_memory: make object_find_by_sha1_git merge results from the CassandraStorage.
ClosedPublic

Authored by vlorentz on Aug 12 2020, 6:01 PM.

Details

Summary

For now this has no effect. However, in the near future, the CassandraStorage
will be in charge of some object types, so we need to merge objects
found in the CassandraStorage and those found directly in the InMemoryStorage.

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D3769 (id=13254)

Could not rebase; Attempt merge onto 6675286c4f...

Updating 6675286c..5f37ca48
Fast-forward
 swh/storage/cassandra/cql.py        |  17 ++--
 swh/storage/cassandra/model.py      |  35 ++++++-
 swh/storage/in_memory.py            | 198 ++++++++++++++++++++++++++----------
 swh/storage/tests/test_in_memory.py |  56 +++++++++-
 swh/storage/writer.py               |   4 +-
 5 files changed, 245 insertions(+), 65 deletions(-)
Changes applied before test
commit 5f37ca48fe645bcb5512a374baf2fb170621cd0e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:44:50 2020 +0200

    in_memory: make object_find_by_sha1_git merge results from the CassandraStorage.
    
    For now this has no effect. However, in the near future, the CassandraStorage
    will be in charge of some object types, so we need to merge objects
    found in the CassandraStorage and those found directly in the InMemoryStorage.

commit 385364bdb4fbc6af87c433e10fd9a68c247fb2ab
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:41:42 2020 +0200

    in_memory: Add InMemoryCqlRunner, a class that emulates cassandra.cql.CqlRunner without Cassandra.
    
    For now it's only used for object counters; but future commits will
    progressively move the in-memory's storage features to it.

commit 6a3ef9a907c76e8a99c06690af3dffb7d93318a8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:34:30 2020 +0200

    Make InMemoryStorage inherit from CassandraStorage.
    
    This has no effect for now, other than deduplicating a method
    and causing a name clash.

commit d9bfed69178e07f882bade35b50fa99a2130211c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:24:33 2020 +0200

    in_memory: Add class Table, which emulates a Cassandra table.
    
    It will be used to implement the in-memory storage as a backend for the
    cassandra storage.

commit fc80a7f06ae8e1fccbe3ce2ee7576d2e9c5ff93a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:16:44 2020 +0200

    cassandra.cql: Fix return type of stat_counters.

commit 267f48024bfd257e91aa95f6e1ae99320587add7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:16:13 2020 +0200

    cassandra.model: Add PARTITION_KEY and CLUSTERING_KEY to the model classes.
    
    They will be used by the in-mem implementation of CqlRunner.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/739/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/739/console

Build is green

Patch application report for D3769 (id=13269)

Could not rebase; Attempt merge onto 6675286c4f...

Updating 6675286c..daff5995
Fast-forward
 swh/storage/cassandra/cql.py        |  17 ++--
 swh/storage/cassandra/model.py      |  35 ++++++-
 swh/storage/in_memory.py            | 198 ++++++++++++++++++++++++++----------
 swh/storage/replay.py               |   2 +-
 swh/storage/tests/test_in_memory.py |  56 +++++++++-
 swh/storage/writer.py               |   4 +-
 6 files changed, 246 insertions(+), 66 deletions(-)
Changes applied before test
commit daff5995b1ad39ad4fc2842aa732f81543d57e10
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:44:50 2020 +0200

    in_memory: make object_find_by_sha1_git merge results from the CassandraStorage.
    
    For now this has no effect. However, in the near future, the CassandraStorage
    will be in charge of some object types, so we need to merge objects
    found in the CassandraStorage and those found directly in the InMemoryStorage.

commit 0fd29e0a6e1a4c61513a21da067d1866e2f9d0ba
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:41:42 2020 +0200

    in_memory: Add InMemoryCqlRunner, a class that emulates cassandra.cql.CqlRunner without Cassandra.
    
    For now it's only used for object counters; but future commits will
    progressively move the in-memory's storage features to it.

commit 6a3ef9a907c76e8a99c06690af3dffb7d93318a8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:34:30 2020 +0200

    Make InMemoryStorage inherit from CassandraStorage.
    
    This has no effect for now, other than deduplicating a method
    and causing a name clash.

commit d9bfed69178e07f882bade35b50fa99a2130211c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:24:33 2020 +0200

    in_memory: Add class Table, which emulates a Cassandra table.
    
    It will be used to implement the in-memory storage as a backend for the
    cassandra storage.

commit fc80a7f06ae8e1fccbe3ce2ee7576d2e9c5ff93a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:16:44 2020 +0200

    cassandra.cql: Fix return type of stat_counters.

commit 267f48024bfd257e91aa95f6e1ae99320587add7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:16:13 2020 +0200

    cassandra.model: Add PARTITION_KEY and CLUSTERING_KEY to the model classes.
    
    They will be used by the in-mem implementation of CqlRunner.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/754/ for more details.

Build is green

Patch application report for D3769 (id=13277)

Could not rebase; Attempt merge onto 6675286c4f...

Updating 6675286c..e6957ca4
Fast-forward
 swh/storage/cassandra/cql.py        |  17 ++-
 swh/storage/cassandra/model.py      |  35 +++++-
 swh/storage/cassandra/storage.py    |  45 ++++----
 swh/storage/in_memory.py            | 208 +++++++++++++++++++++++++++---------
 swh/storage/replay.py               |   2 +-
 swh/storage/tests/test_in_memory.py |  62 ++++++++++-
 swh/storage/tests/test_storage.py   |  56 +++++++++-
 swh/storage/writer.py               |   4 +-
 8 files changed, 340 insertions(+), 89 deletions(-)
Changes applied before test
commit e6957ca4a912f8689fd8bfd919efeb955f21589e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:44:50 2020 +0200

    in_memory: make object_find_by_sha1_git merge results from the CassandraStorage.
    
    For now this has no effect. However, in the near future, the CassandraStorage
    will be in charge of some object types, so we need to merge objects
    found in the CassandraStorage and those found directly in the InMemoryStorage.

commit 892880c4d86ca433594d602ab5e5b25eb6b50229
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:41:42 2020 +0200

    in_memory: Add InMemoryCqlRunner, a class that emulates cassandra.cql.CqlRunner without Cassandra.
    
    For now it's only used for object counters; but future commits will
    progressively move the in-memory's storage features to it.

commit 69cb557387a8a5e490501c0f2bd9aec47762ee36
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:34:30 2020 +0200

    Make InMemoryStorage inherit from CassandraStorage.
    
    This has no effect for now, other than deduplicating a method
    and causing a name clash.

commit 30d65e6bd9e290e2eb0b1e029a93be2428fb33fa
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:24:33 2020 +0200

    in_memory: Add class Table, which emulates a Cassandra table.
    
    It will be used to implement the in-memory storage as a backend for the
    cassandra storage.

commit a0037c5d12a5694bfab749896db8546b7176b633
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:16:44 2020 +0200

    cassandra.cql: Fix return type of stat_counters.

commit f53cc6fc8606e9f97af0c3eea0ce0d7ec46d486a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:16:13 2020 +0200

    cassandra.model: Add PARTITION_KEY and CLUSTERING_KEY to the model classes.
    
    They will be used by the in-mem implementation of CqlRunner.

commit dd320a63b32b0e79c6f90a83236d607093219359
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 20:25:23 2020 +0200

    cassandra: Make origin_visit_get_latest filter using any status of a visit, instead of just the last.
    
    This fixes a mismatch in behavior with the pg and the in-mem storages

commit 1d45c5a0debc2e45ceee5effe565aab36905eb1c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 19:08:17 2020 +0200

    cassandra: Fix wrong algo reported in HashCollision, because of variable shadowing.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/762/ for more details.

anlambert added inline comments.
swh/storage/in_memory.py
233–260

Is is really part of that diff ?

swh/storage/in_memory.py
233–260

Yes, because super().object_find_by_sha1_git(ids) call CassandraStorage's object_find_by_sha1_git, which calls these methods

anlambert added inline comments.
swh/storage/in_memory.py
233–260

ack

This revision is now accepted and ready to land.Aug 14 2020, 12:26 PM

Build is green

Patch application report for D3769 (id=13341)

Could not rebase; Attempt merge onto 6675286c4f...

Updating 6675286c..397a645e
Fast-forward
 swh/storage/cassandra/cql.py        |  17 ++-
 swh/storage/cassandra/model.py      |  35 +++++-
 swh/storage/cassandra/storage.py    |  47 ++++----
 swh/storage/in_memory.py            | 208 +++++++++++++++++++++++++++---------
 swh/storage/replay.py               |   2 +-
 swh/storage/tests/test_in_memory.py |  62 ++++++++++-
 swh/storage/tests/test_storage.py   |  66 +++++++++++-
 swh/storage/writer.py               |   4 +-
 8 files changed, 348 insertions(+), 93 deletions(-)
Changes applied before test
commit 397a645ebf1ed8d7537dcde61ef7f01ffc4127ac
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:44:50 2020 +0200

    in_memory: make object_find_by_sha1_git merge results from the CassandraStorage.
    
    For now this has no effect. However, in the near future, the CassandraStorage
    will be in charge of some object types, so we need to merge objects
    found in the CassandraStorage and those found directly in the InMemoryStorage.

commit a96c253ab428e043de3a9c4ccc3ff04c179c3fa2
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:41:42 2020 +0200

    in_memory: Add InMemoryCqlRunner, a class that emulates cassandra.cql.CqlRunner without Cassandra.
    
    For now it's only used for object counters; but future commits will
    progressively move the in-memory's storage features to it.

commit bc47283ddc8ca5397d8e25f25c670a9c4f21c435
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:34:30 2020 +0200

    Make InMemoryStorage inherit from CassandraStorage.
    
    This has no effect for now, other than deduplicating a method
    and causing a name clash.

commit 20971864bb1638b22bef00c79c7f0cb1be8afeed
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:24:33 2020 +0200

    in_memory: Add class Table, which emulates a Cassandra table.
    
    It will be used to implement the in-memory storage as a backend for the
    cassandra storage.

commit ef0600539bb7716380791e9c49619387c30407f4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:16:44 2020 +0200

    cassandra.cql: Fix return type of stat_counters.

commit 1266b6a7fe5006746e579e1ec28b4fb600b1188e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 13:16:13 2020 +0200

    cassandra.model: Add PARTITION_KEY and CLUSTERING_KEY to the model classes.
    
    They will be used by the in-mem implementation of CqlRunner.

commit 3dc69aaa42e76e373c4d23c1e8d98af5a804970f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 20:25:23 2020 +0200

    cassandra: Make origin_visit_get_latest filter using any status of a visit, instead of just the last.
    
    This fixes a mismatch in behavior with the pg and the in-mem storages

commit 006eeecaba7523f407fa0e253d6c832ea246d959
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Aug 12 19:08:17 2020 +0200

    cassandra: Fix wrong algo reported in HashCollision, because of variable shadowing.

commit da287313765da62d34dc0dca7c22dbd61f40504f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Aug 14 15:15:54 2020 +0200

    cassandra: Fix content_missing_per_sha1 when its parameter has length != 1.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/802/ for more details.