Estimate for Kafka cluster specifications
Closed, MigratedEdits Locked
Actions

Assigned To

Authored By

	olasd
	Apr 10 2018, 2:01 PM

Description

The heart of the journal infrastructure will be an Apache Kafka cluster.

We want 2 replicas for each (partition of each) topic, so we can have a node ("broker") shut down for maintenance without impacting production. More replication is not necessary at least for the first iteration of the production cluster.

We don't have an exact picture of the disk storage requirements yet: we know that we will serialize the data for all nodes in the metadata graph in their canonical form (a hash mapping) using a fairly compact representation (msgpack). This means that the per-row space overhead will be lower than with PostgreSQL, but that the duplication will be higher, notably for directory metadata which will duplicate directory entries.

In sharp contrast to PostgreSQL, Kafka is really intended for sequential reads, and only uses a small index as a primary key to evict duplicate entries in the journal (this eviction index is how updates and deletes are implemented).

I think we should allocate 5-6TB of (raw, unreplicated) disk space, and go from there.

Kafka is now tuned to work well in a JBOD setup: the patches referenced on https://www.slideshare.net/DongLin1/kafka-jbod-linke have all been merged and released now. There's no concrete need for an underlying RAID0 as kafka can handle writing to several disks itself reliably (without taking down the entire broker if only one disk goes away).

Memory requirements for the kafka broker itself are quite low (the current prototype deployment runs fine on a VM with 2GB of RAM). The main requirements for memory come from the buffer cache, which helps avoid hitting disk for reads from all clients.

We'll have two sets of clients: clients that read the full log from the top each time, and clients that will keep up to date with the journal and only read new messages. There's no point in catering for the former type of clients in the buffer cache space, as they'll be out of sync.

We expect to have up to 10 "synchronous" clients in this initial deployment, representing internal clients for the swh infra (archiver, indexer) and external mirrors of the archive. The number of "one-shot" clients will depend on the needs of the day but should remain fairly low (that involves reading the full metadata of the archive after all). By kafka standards, this is a very tiny deployment.

To sum up the resource requirements :

6 TB of raw hard disk space, can be split across devices and nodes, should be fairly contiguous to get good sequential access performance, but it's not massively critical.
4-6 GB of RAM per node, to be split between the usage for the broker (1-2GB) and the buffer cache (3-4 GB)

Related Objects
Search...

		Status	Assigned	Task
		Migrated	gitlab-migration	T424 swh-journal: persistent journal infrastructure to record additions to the swh-storage
		Migrated	gitlab-migration	T1017 Estimate for Kafka cluster specifications

Event Timeline

olasd triaged this task as Normal priority.Apr 10 2018, 2:01 PM

olasd created this task.

Adding a relation to T792 since there is no choice but to use the same underlying hardware for both Kafka and Elasticsearch.

ftigeot mentioned this in T792: Make the elasticsearch logging cluster actually a cluster.Apr 20 2018, 2:43 PM

We have 3x 1U servers which will also be used for an Elasticsearch cluster.
Sharing hardware with Elasticsearch is generally a bad idea, especially for storage.
I propose the following setup:

One separate Kafka instance per server
One dedicated 2TB Kafka HDD per server
2GB of JVM memory per Kafka instance

The "buffer cache" is managed by the operating system and, as far as I know, there isn't a way to dedicate some of it to a particular application.
This will be one more shared resource.

ftigeot removed a parent task: T792: Make the elasticsearch logging cluster actually a cluster.Jul 24 2019, 10:35 AM

After running the backfiller and the journal writer for a while, the following topic sizes (before replication) have been estimated for the current contents of the archive:

Topic	size
swh.journal.objects.content	636.6 GB
swh.journal.objects.directory	18.6 TB
swh.journal.objects.revision	942.5 GB
swh.journal.objects.release	6.7 GB
swh.journal.objects.snapshot	80.6 GB
swh.journal.objects.origin	16.1 GB
swh.journal.objects.origin_visit	150.0 GB

This means that the current infra is (ridiculously) undersized.

Followup in T1829.

This task has been migrated to GitLab.

Estimate for Kafka cluster specificationsClosed, MigratedEdits LockedActions

Description

Related ObjectsSearch...

Event Timeline

Estimate for Kafka cluster specifications
Closed, MigratedEdits Locked
Actions

Related Objects
Search...