Define the requirements for an on-premise Cassandra cluster
Closed, MigratedEdits Locked
Actions

Assigned To

Authored By

	vlorentz
	Mar 5 2021, 12:35 PM

Related Objects
Search...

		Status	Assigned	Task
		Migrated	gitlab-migration	T3091 Order hardware for an on-premise Cassandra cluster
		Migrated	gitlab-migration	T3092 Define the requirements for an on-premise Cassandra cluster

Event Timeline

vlorentz triaged this task as Normal priority.Mar 5 2021, 12:35 PM

vlorentz created this task.

Summary of a discussion on 2021-01-05, on using "HDD+fully loaded in RAM" vs "SSD":

The expected size of the database on-disk, with compression and without replication is 5TB.

Very roughly, this means that if we want it to fit in RAM, the RAM usage would be around 10TB, so 30TB post-replication. Computation from @olasd says it's 528k€ worth of 128GB sticks, or 250k€ worth 64GB sticks. And this means respectively 8-15 or 15-30 servers, as a server can hold 16 or 32 sticks.

And that's not even account for extra RAM to plan for growth and migrations. A priori that's too expansive, so that option is out for now.

This means we need SSDs to store the data, as the read workload is almost entirely random.
So, at least 15TB of SSDs post-replication, and let's double it to allow for some growth + extra space needed while migrating data, that's a minimum of 30TB of SSD

In terms of RAM, we currently have a 1/20 ratio to the storage for the postgresql storage. If we want to keep the same ratio, that's 1.5TB of RAM for the cache. We also need at least 32+8GB/server for Cassandra itself, which is negligible. So that's 1.5TB of RAM total, which is more reasonable; so assuming 64GB sticks (because cheaper), that's 24 sticks, so we only need two servers to hold that much RAM.

But that's not enough for reasonable replication (1 main and 2 copies), so we need at least 3 servers at any time.

So, we need 30TB of SSD and 1.5TB of RAM, spread across 3 servers, which means 3 servers, with 10TB of SSD and 0.5TB of RAM per server.

Now, we need some hot spare, so at the very least, 4 servers with these specs.

Also, it was implicit in my previous comment, but replication would be done entirely at the Cassandra level, so no RAID. Every Cassandra documentation discourages RAID (other than RAID0), as a a Cassandra server has no issue using multiple directories each mounted from a different disk.

So in summary, the minimal requirements, allowing for replication + migrations + a little growth + hot spare:

4 servers
0.5TB of RAM per server, and it should have ECC
10TB of SSD per server, JBOD. They should probably be NVMe (IIRC, NVMe SSDs are the same price as SAS SSDs)
gigabit router/switch between the servers
we won't need to add more hardware inside these servers after they are productionized, instead we will add other servers with similar specs as we grow

Benefits from increasing each of these specs:

Spreading the same specs across more servers means it's less expensive to add one more, but also needs more rack space
More RAM -> more cache + more room for the GC -> faster
Bigger disks -> more room for growth. More disks -> more room for growth + possibly faster as it spreads the load
I don't know if a gigabit router/NIC would be a bottleneck.

vlorentz closed this task as Resolved.Mar 15 2021, 11:34 AM

Did you consider PMem (and other configurations for Intel Optane memory) in your discussion? It offers a very interesting price/performance ratio.
There are machines on Grid5000 available to test this technology if needed.

@rdicosmo I have not, good idea. While they are probably too expansive to use as the main storage instead of SSDs (either via a regular FS or by using a Pmem-aware Cassandra fork), we could use them in addition to the above requirements.

For example, just for the FS journal, which we already do for the current objstorage IIRC.

Cassandra also has its own journal (commitlog_directory). The documentation even says HDDs are fine for this directory, but Pmem would probably improve the write latency, I guess? (And we sure do a lot of of small commits).

So in short, it's not clear to me what the gains are (and I don't have time to check them); but we could add as a "soft requirement" on the servers that they should have a couple of Optane slots, so that we have room to upgrade in a couple of years if needed.

Let's organise a call next week to explore the options, including the new opportunities of testing that emerged recently.

vsellier added a subscriber: vsellier.Mar 15 2021, 6:55 PM

This task has been migrated to GitLab.

Define the requirements for an on-premise Cassandra clusterClosed, MigratedEdits LockedActions

Related ObjectsSearch...

Event Timeline

Define the requirements for an on-premise Cassandra cluster
Closed, MigratedEdits Locked
Actions

Related Objects
Search...