README.md
No OneTemporary
Actions

Size

20 KB

Subscribers

None

README.md
View Options

	# cassandra

	#### Table of Contents

	1. [Overview](#overview)
	2. [Setup - The basics of getting started with cassandra](#setup)
	* [What cassandra affects](#what-cassandra-affects)
	* [Beginning with cassandra](#beginning-with-cassandra)
	* [Upgrading](#upgrading)
	3. [Usage - Configuration options and additional functionality](#usage)
	4. [Reference - An under-the-hood peek at what the module is doing and how](#reference)
	5. [Limitations - OS compatibility, etc.](#limitations)
	6. [Contributers](#contributers)
	7. [External Links](#external-links)

	## Overview

	This module installs and configures Apache Cassandra. The installation steps
	were taken from the installation documentation prepared by DataStax [1] and
	the configuration parameters are the same as those for the Puppet module
	developed by msimonin [2].

	## Setup

	### What cassandra affects

	* Installs the Cassandra package (default dsc21).
	* Configures settings in ${config_path}/cassandra.yaml.
	* Optionally insures that the Cassandra service is enabled and running.
	* Optionally installs the Cassandra support tools (e.g. cassandra21-tools).
	* Optionally configures a Yum repository to install the Cassandra packages
	from (on RedHat).
	* Optionally configures an Apt repository to install the Cassandra packages
	from (on Ubuntu).
	* Optionally installs a JRE/JDK package (e.g. java-1.7.0-openjdk).
	* Optionally installs the DataStax agent.

	### Beginning with cassandra

	This most basic example would attempt to install the default Cassandra package
	(assuming there is an available repository). See the Usage(#usage) section
	for more realistic scenarios.

	```puppet
	node 'example' {
	include '::cassandra'
	}
	```

	To install the DataStax agent, include the specific class.

	```puppet
	node 'example' {
	include '::cassandra'
	include '::cassandra::datastax_agent'
	}
	```

	To install with a reasonably sensible Java environment include the java
	subclass.

	```puppet
	node 'example' {
	include '::cassandra'
	include '::cassandra::java'
	}
	```

	To install the main cassandra package (which is mandatory) and all the
	optional packages, do the following:

	```puppet
	node 'example' {
	include '::cassandra'
	include '::cassandra::datastax_agent'
	include '::cassandra::java'
	}
	```

	By saying the cassandra class/package is mandatory, what is meant is that all
	the sub classes have a dependancy on the main class. So for example one
	could not specify the cassandra::java class for a node with the cassandra
	class also being included.

	### Upgrading

	Changes in 0.4.0

	* cassandra::datastax_agent_package_ensure has now been replaced with
	cassandra::datastax_agent::package_ensure.
	* cassandra::datastax_agent_service_enable has now been replaced with
	cassandra::datastax_agent::service_enable.
	* cassandra::datastax_agent_service_ensure has now been replaced with
	cassandra::datastax_agent::service_ensure.
	* cassandra::datastax_agent_package_name has now been replaced with
	cassandra::datastax_agent::package_name.
	* cassandra::datastax_agent_service_name has now been replaced with
	cassandra::datastax_agent::service_name.

	* cassandra::java_package_ensure has now been replaced with
	cassandra::java::ensure.
	* cassandra::java_package_name has now been replaced with
	cassandra::java::package_name.

	Changes in 0.3.0

	* cassandra_opt_package_ensure changed from 'present' to undef.

	* The manage_service option has been replaced with service_enable and
	service_ensure.

	## Usage

	To install Cassandra in a two node cluster called 'Foobar Cluster' where
	node1 (192.168.42.1) is the seed and node2 192.168.42.2 is also to be a
	member, do something similar to this:

	```puppet
	include cassandra::java

	node 'node1' {
	class { 'cassandra':
	cluster_name => 'Foobar Cluster',
	listen_address => "${::ipaddress}",
	seeds => "${::ipaddress}",
	cassandra_opt_package_ensure => 'present',
	manage_dsc_repo => true
	}
	}

	node 'node2' {
	class { 'cassandra':
	cluster_name => 'Foobar Cluster',
	listen_address => "${::ipaddress}",
	seeds => '192.168.42.1',
	cassandra_opt_package_ensure => 'present',
	manage_dsc_repo => true
	}
	}
	```

	This would also ensure that the JDK is installed and the optional Cassandra
	tools.

	### Class: cassandra

	#### Parameters

	#####`authenticator`
	Authentication backend, implementing IAuthenticator; used to identify users
	Out of the box, Cassandra provides
	org.apache.cassandra.auth.{AllowAllAuthenticator, PasswordAuthenticator}.

	* AllowAllAuthenticator performs no checks - set it to disable authentication.
	* PasswordAuthenticator relies on username/password pairs to authenticate
	users. It keeps usernames and hashed passwords in system_auth.credentials
	table. Please increase system_auth keyspace replication factor if you use this
	authenticator.

	Default: AllowAllAuthenticator

	#####`authorizer`
	Authorization backend, implementing IAuthorizer; used to limit access/provide
	permissions Out of the box, Cassandra provides
	org.apache.cassandra.auth.{AllowAllAuthorizer, CassandraAuthorizer}.

	* AllowAllAuthorizer allows any action to any user - set it to disable
	authorization.
	* CassandraAuthorizer stores permissions in system_auth.permissions table.
	Please increase system_auth keyspace replication factor if you use this
	authorizer.

	Default: AllowAllAuthorizer

	#####`auto_snapshot`
	Whether or not a snapshot is taken of the data before keyspace truncation
	or dropping of column families. The STRONGLY advised default of true
	should be used to provide data safety. If you set this flag to false, you will
	lose data on truncation or drop (default true).

	#####`cassandra_opt_package_ensure`
	The status of the package specified in cassandra_opt_package_name. Can be
	present, latest or a specific version number. If
	cassandra_opt_package_name is undef, this option has no effect (default
	present).

	#####`cassandra_opt_package_name`
	If left at the default, this will change to 'cassandra21-tools' on RedHat
	or 'cassandra-tools' on Ubuntu. Alternatively this use can specify the
	package name
	(default undef).

	#####`cassandra_package_ensure`
	The status of the package specified in cassandra_package_name. Can be
	present, latest or a specific version number (default present).

	#####`cassandra_package_name`
	The name of the Cassandra package. Must be installable from a repository
	(default dsc21).

	#####`cassandra_yaml_tmpl`
	The path to the Puppet template for the Cassandra configuration file. This
	allows the user to supply their own customized template. A Cassandra 1.X
	compatible template called cassandra1.yaml.erb has been provided by @Spredzy
	(default cassandra/cassandra.yaml.erb).

	#####`client_encryption_enabled`
	Enable or disable client/server encryption (default false).

	#####`client_encryption_keystore`
	Keystore for client_encryption (default conf/.keystore).

	#####`client_encryption_keystore_password`
	Keystore password for client encryption (default cassandra).

	#####`cluster_name`
	The name of the cluster. This is mainly used to prevent machines in one logical
	cluster from joining another (default Test Cluster).

	#####`commitlog_directory`
	Commit log. when running on magnetic HDD, this should be a separate spindle
	than the data directories (default /var/lib/cassandra/commitlog).

	#####`concurrent_counter_writes`
	For workloads with more data than can fit in memory, Cassandra's bottleneck
	will be reads that need to fetch data from disk. "concurrent_reads"
	should be set to (16 * number_of_drives) in order to allow the operations to
	enqueue low enough in the stack that the OS and drives can reorder them. Same
	applies to "concurrent_counter_writes", since counter writes read the current
	values before incrementing and writing them back.

	On the other hand, since writes are almost never IO bound, the ideal
	number of "concurrent_writes" is dependent on the number of cores in
	your system; (8 * number_of_cores) is a good rule of thumb (default 32).

	#####`concurrent_reads`
	For workloads with more data than can fit in memory, Cassandra's bottleneck
	will be reads that need to fetch data from disk. "concurrent_reads"
	should be set to (16 * number_of_drives) in order to allow the operations to
	enqueue low enough in the stack that the OS and drives can reorder them. Same
	applies to "concurrent_counter_writes", since counter writes read the current
	values before incrementing and writing them back.

	On the other hand, since writes are almost never IO bound, the ideal
	number of "concurrent_writes" is dependent on the number of cores in
	your system; (8 * number_of_cores) is a good rule of thumb (default 32).

	#####`concurrent_writes`
	For workloads with more data than can fit in memory, Cassandra's bottleneck
	will be reads that need to fetch data from disk. "concurrent_reads"
	should be set to (16 * number_of_drives) in order to allow the operations to
	enqueue low enough in the stack that the OS and drives can reorder them. Same
	applies to "concurrent_counter_writes", since counter writes read the current
	values before incrementing and writing them back.

	On the other hand, since writes are almost never IO bound, the ideal
	number of "concurrent_writes" is dependent on the number of cores in
	your system; (8 * number_of_cores) is a good rule of thumb (default 32).

	#####`config_path`
	The path to the cassandra configuration file. If this is undef, it will be
	changed to /etc/cassandra/default.conf on the RedHat family of operating
	systems or /etc/cassandra on Ubuntu. Otherwise the user can specify the
	path name
	(default undef).

	#####`data_file_directories`
	Directories where Cassandra should store data on disk. Cassandra
	will spread data evenly across them, subject to the granularity of
	the configured compaction strategy (default ['/var/lib/cassandra/data']).

	#####`disk_failure_policy`
	Policy for data disk failures:

	* die: shut down gossip and Thrift and kill the JVM for any fs errors or
	single-sstable errors, so the node can be replaced.
	* stop_paranoid: shut down gossip and Thrift even for single-sstable errors.
	* stop: shut down gossip and Thrift, leaving the node effectively dead, but
	can still be inspected via JMX.
	* best_effort: stop using the failed disk and respond to requests based on
	remaining available sstables. This means you WILL see obsolete
	data at CL.ONE!
	* ignore: ignore fatal errors and let requests fail, as in pre-1.2 Cassandra.

	Default: stop

	#####`endpoint_snitch`
	Set this to a class that implements IEndpointSnitch. The snitch has two
	functions:

	* it teaches Cassandra enough about your network topology to route
	requests efficiently.
	* it allows Cassandra to spread replicas around your cluster to avoid
	correlated failures. It does this by grouping machines into
	"datacenters" and "racks." Cassandra will do its best not to have
	more than one replica on the same "rack" (which may not actually
	be a physical location).

	IF YOU CHANGE THE SNITCH AFTER DATA IS INSERTED INTO THE CLUSTER,
	YOU MUST RUN A FULL REPAIR, SINCE THE SNITCH AFFECTS WHERE REPLICAS
	ARE PLACED.

	Out of the box, Cassandra provides:

	* SimpleSnitch: Treats Strategy order as proximity. This can improve cache
	locality when disabling read repair. Only appropriate for
	single-datacenter deployments.
	* GossipingPropertyFileSnitch: This should be your go-to snitch for production
	use. The rack and datacenter for the local node are defined in
	cassandra-rackdc.properties and propagated to other nodes via
	gossip. If cassandra-topology.properties exists, it is used as a
	fallback, allowing migration from the PropertyFileSnitch.
	* PropertyFileSnitch: Proximity is determined by rack and data center, which are
	explicitly configured in cassandra-topology.properties.
	* Ec2Snitch: Appropriate for EC2 deployments in a single Region. Loads Region
	and Availability Zone information from the EC2 API. The Region is
	treated as the datacenter, and the Availability Zone as the rack.
	Only private IPs are used, so this will not work across multiple Regions.
	* Ec2MultiRegionSnitch: Uses public IPs as broadcast_address to allow
	cross-region connectivity. (Thus, you should set seed addresses to the public
	IP as well.) You will need to open the storage_port or
	ssl_storage_port on the public IP firewall. (For intra-Region
	traffic, Cassandra will switch to the private IP after
	establishing a connection.)
	* RackInferringSnitch: Proximity is determined by rack and data center, which
	are assumed to correspond to the 3rd and 2nd octet of each node's IP
	address, respectively. Unless this happens to match your
	deployment conventions, this is best used as an example of
	writing a custom Snitch class and is provided in that spirit.

	You can use a custom Snitch by setting this to the full class name
	of the snitch, which will be assumed to be on your classpath.

	Default: SimpleSnitch

	#####`hinted_handoff_enabled`
	See http://wiki.apache.org/cassandra/HintedHandoff May either be "true" or
	"false" to enable globally, or contain a list of data centers to enable
	per-datacenter (e.g. 'DC1,DC2'). Defaults to 'true'.

	#####`incremental_backups`
	Set to true to have Cassandra create a hard link to each sstable
	flushed or streamed locally in a backups/ subdirectory of the
	keyspace data. Removing these links is the operator's
	responsibility (default false).

	#####`internode_compression`
	Controls whether traffic between nodes is compressed. Can be:

	* all - all traffic is compressed
	* dc - traffic between different datacenters is compressed
	* none - nothing is compressed.

	Default all

	#####`listen_address`
	Address or interface to bind to and tell other Cassandra nodes to connect to
	(default localhost).

	#####`manage_dsc_repo`
	If set to true then a repository will be setup so that packages can be
	downloaded from the DataStax community edition (default false).

	#####`native_transport_port`
	Port for the CQL native transport to listen for clients on
	For security reasons, you should not expose this port to the internet.
	Firewall it if needed (default 9042).

	#####`num_tokens`
	This defines the number of tokens randomly assigned to this node on the ring
	The more tokens, relative to other nodes, the larger the proportion of data
	that this node will store. You probably want all nodes to have the same number
	of tokens assuming they have equal hardware capability.

	#####`partitioner`
	The partitioner is responsible for distributing groups of rows (by
	partition key) across nodes in the cluster. You should leave this
	alone for new clusters. The partitioner can NOT be changed without
	reloading all data, so when upgrading you should set this to the
	same partitioner you were already using.

	Besides Murmur3Partitioner, partitioners included for backwards
	compatibility include RandomPartitioner, ByteOrderedPartitioner, and
	OrderPreservingPartitioner (default
	org.apache.cassandra.dht.Murmur3Partitioner)

	#####`rpc_address`
	The address to bind the Thrift RPC service and native transport server to
	(default localhost).

	#####`rpc_port`
	Port for Thrift to listen for clients on (default 9160).

	#####`rpc_server_type`
	Cassandra provides two out-of-the-box options for the RPC Server:

	* One thread per thrift connection. For a very large number of clients,
	memory will be your limiting factor. On a 64 bit JVM, 180KB is the minimum
	stack size per thread, and that will correspond to your use of virtual memory
	(but physical memory may be limited depending on use of stack space).
	* Stands for "half synchronous, half asynchronous." All thrift clients
	are handled asynchronously using a small number of threads that does
	not vary with the amount of thrift clients (and thus scales well to many
	clients). The rpc requests are still synchronous (one thread per active
	request). If hsha is selected then it is essential that rpc_max_threads
	is changed from the default value of unlimited.

	The default is sync because on Windows hsha is about 30% slower. On Linux,
	sync/hsha performance is about the same, with hsha of course using less memory.

	Alternatively, you can provide your own RPC server by providing the
	fully-qualified class name of an o.a.c.t.TServerFactory that can create an
	instance of it.

	#####`saved_caches_directory`
	Default: /var/lib/cassandra/saved_caches

	#####`seeds`
	Addresses of hosts that are deemed contact points. Cassandra nodes use this
	list of hosts to find each other and learn the topology of the ring. You must
	change this if you are running multiple nodes! Seeds is actually a
	comma-delimited list of addresses (default 127.0.0.1).

	#####`server_encryption_internode`
	Enable or disable inter-node encryption (default none).

	#####`server_encryption_keystore`
	Default: conf/.keystore

	#####`server_encryption_keystore_password`
	Default: cassandra

	#####`server_encryption_truststore`
	Default: conf/.truststore

	#####`server_encryption_truststore_password`
	Default: cassandra

	#####`service_enable`
	Enable the Cassandra service to start at boot time. Valid values are true
	or false
	(default: true)

	#####`service_ensure`
	Ensure the Cassandra service is running. Valid values are running or stopped
	(default: running)

	#####`service_name`
	The name of the service that runs the Cassandra software (default
	cassandra).

	#####`snapshot_before_compaction`
	Whether or not to take a snapshot before each compaction. Be
	careful using this option, since Cassandra won't clean up the
	snapshots for you. Mostly useful if you're paranoid when there
	is a data format change (default false).

	#####`start_native_transport`
	Whether to start the native transport server. Please note that the address on
	which the native transport is bound is the same as the rpc_address. The port
	however is different and specified below (default true).

	#####`start_rpc`
	Whether to start the thrift rpc server (default true).

	#####`storage_port`
	TCP port, for commands and data for security reasons, you should not expose this
	port to the internet. Firewall it if needed (default 7000).

	### Class: cassandra::datastax_agent

	####`package_ensure`
	Is passed to the package reference. Valid values are present or a version
	number
	(default present).

	####`package_name`
	Is passed to the package reference (default datastax-agent).

	####`service_ensure`
	Is passed to the service reference (default running).

	####`service_enable`
	Is passed to the service reference (default true).

	####`service_name`
	Is passed to the service reference (default datastax-agent).

	####`stomp_interface`
	If the value is changed from the default of undef then this is what is
	set as the stomp_interface setting in /var/lib/datastax-agent/conf/address.yaml
	which connects the agent to an OpsCenter instance
	(default undef).

	### Class: cassandra::java

	####`ensure`
	Is passed to the package reference. Valid values are present or a version
	number
	(default present).

	####`package_name`
	If the default value of undef is left as it is, then a package called
	java-1.8.0-openjdk-headless or openjdk-7-jre-headless will be installed
	on a Red Hat family or Ubuntu system respectively. Alternatively, one
	can specify a package that is available in a package repository to the
	node
	(default undef).

	## Reference

	This module uses the package type to install the Cassandra package, the
	optional Cassandra tools, the DataStax agent and Java package.

	It optionally uses the service type to enable the cassandra service and/or the
	DataStax agent and ensure that they are running.

	It also uses the yumrepo type on the RedHat family of operating systems to
	(optionally) install the DataStax Repo for Apache Cassandra.

	On Ubuntu, the apt class is optionally utilised.

	## Limitations

	This module currently still has somewhat limited functionality. More
	parameters and configuration parameters will be added later.

	There is currently no method for this module to manipulate Java options.

	Currently the is no configuration or customisation of the DataStax Agent.

	Tested on the RedHat family versions 6 and 7, Ubuntu 12.04 and 14.04, Puppet
	(CE) 3.7.5 and DSC 2.1.5.

	## Contributers

	Contributions will be greatfully accepted. Please go to the project page,
	fork the project, make your changes locally and then raise a pull request.
	Details on how to do this are available at
	https://guides.github.com/activities/contributing-to-open-source.

	### Additional Contributers

	Yanis Guenane (GitHub [@spredzy](https://github.com/Spredzy)) provided the
	Cassandra 1.x compatible template
	(see [#11](https://github.com/locp/cassandra/pull/11)).

	## External Links

	[1] - Installing DataStax Community on RHEL-based systems, available at
	http://docs.datastax.com/en/cassandra/2.1/cassandra/install/installRHEL_t.html, accessed 25th May 2015.

	[2] - *msimonin/cassandra: Puppet module to install Apache Cassandra from
	the DataStax distribution. Forked from gini/cassandra*, available at
	https://forge.puppetlabs.com/msimonin/cassandra, accessed 17th March 2015.

File Metadata

Mime Type: text/plain
Expires: Jun 4 2025, 7:33 PM (9 w, 6 d ago)
Storage Engine: blob
Storage Format: Raw Data
Storage Handle: 3337091

README.mdNo OneTemporaryActions

README.mdView Options

File Metadata

Event Timeline

README.md
No OneTemporary
Actions

README.md
View Options