Page MenuHomeSoftware Heritage

Add Cassandra backend.
ClosedPublic

Authored by vlorentz on Jan 21 2020, 2:59 PM.

Details

Summary

Related to T2186

Diff Detail

Repository
rDSTO Storage manager
Branch
cassandra-backend3
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 10317
Build 15331: tox-on-jenkinsJenkins
Build 15330: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
vlorentz edited the summary of this revision. (Show Details)

timeout in wait_for_peer.

implement endpoints missing since rebase

"Plus ça rate, plus on a de chances que ça marche !"

  • disable debug logs
  • allocate more RAM to make tests less flaky
douardda added a subscriber: douardda.

Looks good to me, but it would really be nice to have a bit more documentation/explanation on how stuff work and are organized in Cassandra, be it in the code itself and as docu material in doc/

I believe the overrides of postgres specific tests is no longer needed.

Also I've a few comments.

swh/storage/cassandra/cql.py
35

What's this for?

93

No docstring at all?

124

Does it make sense not to wait a bit between read/write attempts?

This revision is now accepted and ready to land.Jan 31 2020, 2:09 PM
ardumont added a subscriber: ardumont.

Looks good.

Small remarks that can be taken care later (after the first landing ;)

I'm mostly unsure about:

  • migration. -> I guess we will have to use python script to migrate when adapting the model.
  • date precision. It seems we are losing some around the end of the diff.
swh/storage/cassandra/cql.py
112

we might be able to move the retry behavior in the retry proxy storage.
Or might be as a simple first step, decorate this method with retry behavior (storage already has the tenacity dependency)

140

statement

swh/storage/cassandra/storage.py
47

why not call it self.backend?
(we should also do the same on storage.storage)

224

<strike>one</strike> at least one

468

other endpoints no longer display the docstring, so might as well remove it as well here (and for other occurrences)
(also in master you made the necessary work for docstrings to be centralized in the storage interface ;)

vlorentz marked 3 inline comments as done.

apply comments.

swh/storage/cassandra/cql.py
35

boilerplate to make python-cassandra happy. (this used to be the default config, but now it raises a warning if I don't provide it)

Basically it tells it to hit the right server directly when sending a query (TokenAwarePolicy), and if there's more than one, then pick one at random that's in the same datacenter as the client (DCAwareRoundRobinPolicy)

112

It's lower-level retries (it may be called multiple times by the same endpoint), so not exactly the same.
I'll look into tenacity.

124

it already waits for 100ms because of the timeout

swh/storage/cassandra/storage.py
47

Because we already call CassandraStorage/Storage/InMemoryStorage backends. Calling this a backend too would be confusing.

run all tests in the samee tox environment, it wastes less time installing deps on jenkins

This revision was automatically updated to reflect the committed changes.