Page MenuHomeSoftware Heritage

Add Cassandra backend.
ClosedPublic

Authored by vlorentz on Jan 21 2020, 2:59 PM.

Details

Summary

Related to T2186

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
vlorentz updated this revision to Diff 9249.Jan 27 2020, 7:20 PM
vlorentz edited the summary of this revision. (Show Details)

timeout in wait_for_peer.

vlorentz updated this revision to Diff 9251.Jan 27 2020, 7:31 PM

retrigger tests

vlorentz updated this revision to Diff 9257.Jan 28 2020, 1:42 PM

more logs

vlorentz updated this revision to Diff 9258.Jan 28 2020, 1:52 PM

fix command

vlorentz updated this revision to Diff 9284.Wed, Jan 29, 1:37 PM

implement endpoints missing since rebase

vlorentz updated this revision to Diff 9291.Wed, Jan 29, 3:11 PM

fix signature

vlorentz updated this revision to Diff 9299.Wed, Jan 29, 4:17 PM

Fix rebase. Depends on D2601.

vlorentz added a comment.EditedWed, Jan 29, 5:04 PM

"Plus ça rate, plus on a de chances que ça marche !"

vlorentz updated this revision to Diff 9310.Thu, Jan 30, 11:29 AM
  • disable debug logs
  • allocate more RAM to make tests less flaky
douardda accepted this revision.Fri, Jan 31, 2:09 PM
douardda added a subscriber: douardda.

Looks good to me, but it would really be nice to have a bit more documentation/explanation on how stuff work and are organized in Cassandra, be it in the code itself and as docu material in doc/

I believe the overrides of postgres specific tests is no longer needed.

Also I've a few comments.

swh/storage/cassandra/cql.py
36

What's this for?

94

No docstring at all?

125

Does it make sense not to wait a bit between read/write attempts?

This revision is now accepted and ready to land.Fri, Jan 31, 2:09 PM
ardumont accepted this revision.Fri, Jan 31, 2:09 PM
ardumont added a subscriber: ardumont.

Looks good.

Small remarks that can be taken care later (after the first landing ;)

I'm mostly unsure about:

  • migration. -> I guess we will have to use python script to migrate when adapting the model.
  • date precision. It seems we are losing some around the end of the diff.
swh/storage/cassandra/cql.py
113

we might be able to move the retry behavior in the retry proxy storage.
Or might be as a simple first step, decorate this method with retry behavior (storage already has the tenacity dependency)

141

statement

swh/storage/cassandra/storage.py
48

why not call it self.backend?
(we should also do the same on storage.storage)

225

<strike>one</strike> at least one

469

other endpoints no longer display the docstring, so might as well remove it as well here (and for other occurrences)
(also in master you made the necessary work for docstrings to be centralized in the storage interface ;)

vlorentz updated this revision to Diff 9341.Fri, Jan 31, 2:26 PM
vlorentz marked 3 inline comments as done.

apply comments.

swh/storage/cassandra/cql.py
36

boilerplate to make python-cassandra happy. (this used to be the default config, but now it raises a warning if I don't provide it)

Basically it tells it to hit the right server directly when sending a query (TokenAwarePolicy), and if there's more than one, then pick one at random that's in the same datacenter as the client (DCAwareRoundRobinPolicy)

113

It's lower-level retries (it may be called multiple times by the same endpoint), so not exactly the same.
I'll look into tenacity.

125

it already waits for 100ms because of the timeout

swh/storage/cassandra/storage.py
48

Because we already call CassandraStorage/Storage/InMemoryStorage backends. Calling this a backend too would be confusing.

vlorentz updated this revision to Diff 9342.Fri, Jan 31, 2:28 PM

add missing comment

vlorentz updated this revision to Diff 9343.Fri, Jan 31, 2:55 PM

run all tests in the samee tox environment, it wastes less time installing deps on jenkins

This revision was automatically updated to reflect the committed changes.