**Table of Contents** *generated with [DocToc](http://doctoc.herokuapp.com/)* - [Kafka Puppet Module](#kafka-puppet-module) - [Requirements](#requirements) - [Usage](#usage) - [Kafka (Clients)](#kafka) - [Kafka Broker Servers](#kafka-broker-server) - [Custom Zookeeper Chroot](#custom-zookeeper-chroot) - [Kafka Mirror](#kafka-mirror) # Kafka Puppet Module A Puppet module for installing and managing [Apache Kafka](http://kafka.apache.org/) brokers. This module is currently being maintained by The Wikimedia Foundation in Gerrit at [operations/puppet/kafka](https://gerrit.wikimedia.org/r/#/admin/projects/operations/puppet/kafka) and mirrored here on [GitHub](https://github.com/wikimedia/puppet-kafka). It was originally developed for 0.7.2 at https://github.com/wikimedia/puppet-kafka-0.7.2. # Requirements - Java - An Kafka 0.8 package. You can build a .deb package using the [operations/debs/kafka debian branch](https://github.com/wikimedia/operations-debs-kafka/tree/debian), or just install using this [prebuilt .deb](http://apt.wikimedia.org/wikimedia/pool/main/k/kafka/) - A running zookeeper cluster. You can set one up using WMF's [puppet-zookeeper module](https://github.com/wikimedia/puppet-zookeeper). # Usage ## Kafka (Clients) ```puppet # Install the kafka package. class { 'kafka': } ``` This will install the Kafka package which includes /usr/sbin/kafka, useful for running client (console-consumer, console-producer, etc.) commands. ## Kafka Broker Server ```puppet # Include Kafka Broker Server. class { 'kafka::server': log_dirs => ['/var/spool/kafka/a', '/var/spool/kafka/b'], brokers => { 'kafka-node01.example.com' => { 'id' => 1, 'port' => 12345 }, 'kafka-node02.example.com' => { 'id' => 2 }, }, zookeeper_hosts => ['zk-node01:2181', 'zk-node02:2181', 'zk-node03:2181'], zookeeper_chroot => '/kafka/cluster_name', } ``` ```log_dirs``` defaults to a single ```['/var/spool/kafka]```, but you may specify multiple Kafka log data directories here. This is useful for spreading your topic partitions across multiple disks. The ```brokers``` parameter is a Hash keyed by ```$::fqdn```. Each value is another Hash that contains config settings for that kafka host. ```id``` is required and must be unique for each Kafka Broker Server host. ```port``` is optional, and defaults to 9092. Each Kafka Broker Server's ```broker_id``` and ```port``` properties in server.properties will be set based by looking up the node's ```$::fqdn``` in the hosts Hash passed into the ```kafka``` base class. ```zookeeper_hosts``` is an array of Zookeeper host:port pairs. ```zookeeper_chroot``` is optional, and allows you to specify a Znode under which Kafka will store its metadata in Zookeeper. This is useful if you want to use a single Zookeeper cluster to manage multiple Kafka clusters. See below for information on how to create this Znode in Zookeeper. ## Custom Zookeeper Chroot If Kafka will share a Zookeeper cluster with other users, you might want to create a Znode in zookeeper in which to store your Kafka cluster's data. You can set the ```zookeeper_chroot``` parameter on the ```kafka``` class to do this. First, you'll need to create the znode manually yourself. You can use ```zkCli.sh``` that ships with Zookeeper, or you can use the kafka built in ```zookeeper-shell```: ``` $ kafka zookeeper-shell :2182 Connecting to kraken-zookeeper Welcome to ZooKeeper! JLine support is enabled WATCHER:: WatchedEvent state:SyncConnected type:None path:null [zk: kraken-zookeeper(CONNECTED) 0] create /my_kafka kafka Created /my_kafka ``` You can use whatever chroot znode path you like. The second argument (```data```) is arbitrary. I used 'kafka' here. Then: ```puppet class { 'kafka::server': brokers => { 'kafka-node01.example.com' => { 'id' => 1, 'port' => 12345 }, 'kafka-node02.example.com' => { 'id' => 2 }, }, zookeeper_hosts => ['zk-node01:2181', 'zk-node02:2181', 'zk-node03:2181'], # set zookeeper_chroot on the kafka class. zookeeper_chroot => '/kafka/clusterA', } ``` ## Kafka Mirror Kafka MirrorMaker is usually used for inter data center Kafka cluster replication and aggregation. You can consume from any number of source Kafka clusters, and produce to a single destination Kafka cluster. ```puppet # Configure kafka-mirror to produce to Kafka Brokers which are # part of our kafka aggregator cluster. class { 'kafka::mirror': destination_brokers => { 'kafka-aggregator01.example.com' => { 'id' => 11 }, 'kafka-aggregator02.example.com' => { 'id' => 12 }, }, topic_whitelist => 'webrequest.*', } # Configure kafka-mirror to consume from both clusterA and clusterB kafka::mirror::consumer { 'clusterA': zookeeper_hosts => ['zk-node01:2181', 'zk-node02:2181', 'zk-node03:2181'], zookeeper_chroot => ['/kafka/clusterA'], } kafka::mirror::consumer { 'clusterB': zookeeper_hosts => ['zk-node01:2181', 'zk-node02:2181', 'zk-node03:2181'], zookeeper_chroot => ['/kafka/clusterB'], } ``` ## jmxtrans monitoring This module contains a class called ```kafka::server::jmxtrans```. It contains a useful jmxtrans JSON config object that can be used to tell jmxtrans to send to any output writer (Ganglia, Graphite, etc.). To you use this, you will need the [puppet-jmxtrans](https://github.com/wikimedia/puppet-jmxtrans) module. ```puppet # Include this class on each of your Kafka Broker Servers. class { '::kafka::server::jmxtrans': ganglia => 'ganglia.example.com:8649', } ``` This will install jmxtrans and start render JSON config files for sending JVM and Kafka Broker stats to Ganglia. See [kafka-jmxtrans.json.md](kafka-jmxtrans.json.md) for a fully rendered jmxtrans Kafka JSON config file.