Page MenuHomeSoftware Heritage

Add Cassandra backend.

Authored by vlorentz on Jan 21 2020, 2:59 PM.



Related to T2186

Diff Detail

rDSTO Storage manager
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
vlorentz edited the summary of this revision. (Show Details)

timeout in wait_for_peer.

implement endpoints missing since rebase

"Plus ça rate, plus on a de chances que ça marche !"

  • disable debug logs
  • allocate more RAM to make tests less flaky
douardda added a subscriber: douardda.

Looks good to me, but it would really be nice to have a bit more documentation/explanation on how stuff work and are organized in Cassandra, be it in the code itself and as docu material in doc/

I believe the overrides of postgres specific tests is no longer needed.

Also I've a few comments.


What's this for?


No docstring at all?


Does it make sense not to wait a bit between read/write attempts?

This revision is now accepted and ready to land.Jan 31 2020, 2:09 PM
ardumont added a subscriber: ardumont.

Looks good.

Small remarks that can be taken care later (after the first landing ;)

I'm mostly unsure about:

  • migration. -> I guess we will have to use python script to migrate when adapting the model.
  • date precision. It seems we are losing some around the end of the diff.

we might be able to move the retry behavior in the retry proxy storage.
Or might be as a simple first step, decorate this method with retry behavior (storage already has the tenacity dependency)




why not call it self.backend?
(we should also do the same on


<strike>one</strike> at least one


other endpoints no longer display the docstring, so might as well remove it as well here (and for other occurrences)
(also in master you made the necessary work for docstrings to be centralized in the storage interface ;)

vlorentz marked 3 inline comments as done.

apply comments.


boilerplate to make python-cassandra happy. (this used to be the default config, but now it raises a warning if I don't provide it)

Basically it tells it to hit the right server directly when sending a query (TokenAwarePolicy), and if there's more than one, then pick one at random that's in the same datacenter as the client (DCAwareRoundRobinPolicy)


It's lower-level retries (it may be called multiple times by the same endpoint), so not exactly the same.
I'll look into tenacity.


it already waits for 100ms because of the timeout


Because we already call CassandraStorage/Storage/InMemoryStorage backends. Calling this a backend too would be confusing.

run all tests in the samee tox environment, it wastes less time installing deps on jenkins

This revision was automatically updated to reflect the committed changes.