Page MenuHomeSoftware Heritage

cassandra: Refactor the former installation scripts
ClosedPublic

Authored by vsellier on Aug 12 2022, 11:01 AM.

Details

Summary
  • Move the previous yaml base configuration to a puppet template
  • Install cassandra via the archive instead of the debian packages It gives more flexibilities regarding the multi instances configuration
  • Support multi instance per server
  • Centralize the configuration to the cassandra.yaml file

There is still some work to do:

  • Manage more cassandra configuration properties
  • Add the tcp port monitoring
  • Wire the exporter metrics to one of the prometheus

Related to T4373

Diff Detail

Repository
rSPSITE puppet-swh-site
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Why do you include cassandra-env.sh? It should be shipped with Cassandra.

eg. it's in conf/ in https://dlcdn.apache.org/cassandra/4.0.5/apache-cassandra-4.0.5-bin.tar.gz

I specified why on the first line of the file, but I realize it not very visible.

The JMX_PORT variable is hardcoded in the file shipped with cassandra.
I've updated the lines 229->231 to check if the JMX_PORT is already specified as en environment variable.

It's necessary to be able to start several instances on the same server.

Setting -Dcassandra.jmx.local.port / -Dcassandra.jmx.remote.port / -Dcom.sun.management.jmxremote.rmi.port in the $JVM_EXTRA_OPTS env var should override what this file configured

Setting -Dcassandra.jmx.local.port / -Dcassandra.jmx.remote.port / -Dcom.sun.management.jmxremote.rmi.port in the $JVM_EXTRA_OPTS env var should override what this file configured

In our case, it should be -Dcassandra.jmx.local.port= and -Dcom.sun.management.jmxremote.authenticate=false
I tried to avoid, deal with the jmx configuration in puppet and also the script will still try to specify the the jmx configration, local or remote (l 240/243).
The parameters will be declared several times and we will rely on how the jvm deals with the parameters order

I'm really not a fan of erb templates for yaml configuration files (specifically, seed_provider: <%= @config["seed_provider"].to_yaml().delete_prefix("---") %> is pretty jarring). I agree that inlining the full default config was not a good idea, though.

For the cassandra-env.sh, java seems to always parse the options left to right, and the rightmost one wins. So we should be safe to put the JMX definitions in JVM_EXTRA_OPTS. Probably.

(Of course, this was documented behavior for java 8, but it's not documented anymore)

  • Rebase
  • Override the jmx port value via the JVM_EXTRA_OPTS environment
  • Inline the cassandra.yaml properties in hiera

I think you've forgotten to remove the erb template :-)

site-modules/profile/manifests/cassandra/instance.pp
99

If you use inline_yaml (which is one of our functions) you get the "Managed by puppet" header. I'm not sure where to_yaml comes from

This revision is now accepted and ready to land.Aug 17 2022, 12:48 PM

use inline_yaml instead of to_yaml
thanks for the hint, I completely forgot about it