Page MenuHomeSoftware Heritage

Puppetize elasticsearch nodes
ClosedPublic

Authored by ardumont on Dec 2 2020, 4:53 PM.

Details

Summary

This allows declaration of the elasticsearch which is configured manually so
far. We reused the actual production configuration in
/etc/elasticsearch/{elasticsearch.yml,jvm_options} as default.

The following diff configures both for production and staging node the
following:

  • /etc/elasticsearch/elasticsearch.yml (overriding the one from the debian package)
  • /etc/elasticsearch/jvm.options.d/jvm.options (adding some Xms/Xmx override)

This also fixed a couple of current paper cuts:

  • uid/gid creation
  • fix the inter-dependency on package/service/apt-config order
  • remove a deprecated xpack configuration (since 7.8.0 which is the prod version)
  • unmanage the no longer required openjdk-8 dependency (es complained about it) [1]

[1] We'll need to uninstall that jdk from the production esnodes

[2] We'll need to apply the following configuration in production on node at a
time.

Related to T2817

Test Plan

vagrant up staging-esnode0 ~> happily configures and starts elasticsearch accordingly

bin/octocatalog-diff on an esnode production node (there is some diff but the
actual configuration in the end is the same as the current one):

bin/octocatalog-diff --octocatalog-diff-args --no-truncate-details --to staging_add_elasticsearch_node esnode1
Found host esnode1.internal.softwareheritage.org
WARN     -> Environment "arcpatch-D4460" contained non-word characters, correcting name to arcpatch_D4460
WARN     -> Environment "open-template1" contained non-word characters, correcting name to open_template1
WARN     -> Environment "update-writer-config" contained non-word characters, correcting name to update_writer_config
WARN     -> Environment "wip-pg-hba-rules-in-yaml" contained non-word characters, correcting name to wip_pg_hba_rules_in_yaml
Cloning into '/tmp/swh-ocd.idXBDTTy/environments/production/data/private'...
done.
Cloning into '/tmp/swh-ocd.idXBDTTy/environments/staging_add_elasticsearch_node/data/private'...
done.
*** Running octocatalog-diff on host esnode1.internal.softwareheritage.org
I, [2020-12-02T16:40:00.786190 #28052]  INFO -- : Catalogs compiled for esnode1.internal.softwareheritage.org
I, [2020-12-02T16:40:02.157247 #28052]  INFO -- : Diffs computed for esnode1.internal.softwareheritage.org
diff origin/production/esnode1.internal.softwareheritage.org current/esnode1.internal.softwareheritage.org
*******************************************
+ Concat::Fragment[0_es_jvm_option] =>
   parameters =>
      "content": "-Xms16g"
      "order": "00"
      "target": "es_jvm_options"
*******************************************
+ Concat::Fragment[1_es_jvm_option] =>
   parameters =>
      "content": "-Xmx16g"
      "order": "00"
      "target": "es_jvm_options"
*******************************************
+ Concat[es_jvm_options] =>
   parameters =>
      "backup": "puppet"
      "ensure": "present"
      "ensure_newline": true
      "force": false
      "format": "plain"
      "group": 119
      "mode": "0644"
      "notify": "Service[elasticsearch]"
      "order": "alpha"
      "owner": 114
      "path": "/etc/elasticsearch/jvm.options.d/jvm.options"
      "replace": true
      "show_diff": true
      "warn": false
*******************************************
+ Concat_file[es_jvm_options] =>
   parameters =>
      "backup": "puppet"
      "ensure_newline": true
      "force": false
      "format": "plain"
      "group": 119
      "mode": "0644"
      "order": "alpha"
      "owner": 114
      "path": "/etc/elasticsearch/jvm.options.d/jvm.options"
      "replace": true
      "show_diff": true
      "tag": "es_jvm_options"
*******************************************
+ Concat_fragment[0_es_jvm_option] =>
   parameters =>
      "content": "-Xms16g"
      "order": "00"
      "tag": "es_jvm_options"
      "target": "es_jvm_options"
*******************************************
+ Concat_fragment[1_es_jvm_option] =>
   parameters =>
      "content": "-Xmx16g"
      "order": "00"
      "tag": "es_jvm_options"
      "target": "es_jvm_options"
*******************************************
+ File[/etc/elasticsearch/elasticsearch.yml] =>
   parameters =>
      "ensure": "file"
      "group": 119
      "mode": "0644"
      "notify": "Service[elasticsearch]"
      "owner": 114
      "content": >>>
# File managed by puppet - modifications will be lost
cluster.name: swh-logging-prod
node.name: esnode1
network.host: 192.168.100.61
discovery.seed_hosts:
- esnode1.internal.softwareheritage.org
- esnode2.internal.softwareheritage.org
- esnode3.internal.softwareheritage.org
cluster.initial_master_nodes:
- esnode1
- esnode2
- esnode3
path.data: "/srv/elasticsearch"
path.logs: "/var/log/elasticsearch"
index.store.type: hybridfs
indices.memory.index_buffer_size: 50%
<<<
*******************************************
+ File[/srv/elasticsearch] =>
   parameters =>
      "ensure": "directory"
      "group": 119
      "mode": "2755"
      "owner": 114
*******************************************
- File_line[elasticsearch store type]
*******************************************
+ Group[elasticsearch] =>
   parameters =>
      "ensure": "present"
      "gid": 119
*******************************************
- Package[openjdk-8-jre-headless]
*******************************************
  Systemd::Dropin_file[elasticsearch.conf] =>
   parameters =>
     notify =>
      + Service[elasticsearch]
*******************************************
  User[elasticsearch] =>
   parameters =>
     gid =>
      - "119"
      + 119
     uid =>
      - "114"
      + 114
*******************************************
*** End octocatalog-diff on esnode1.internal.softwareheritage.org

Diff Detail

Repository
rSPSITE puppet-swh-site
Branch
staging_add_elasticsearch_node
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 17684
Build 27340: arc lint + arc unit

Event Timeline

Override heap_size more "appropriately" for staging and vagrant

Looks like a good step forward overall.

Can you move the elasticsearch config to another file in the common directory, rather than the main file (you could make it an elastic.yml file with the config for kibana and logstash as well)?

I've made a bunch of other comments inline.

data/common/common.yaml
2859–2860

This should use the same listen_network/ip_for_network trick as kibana, logstash and other things like them, rather than having to override it on all hosts.

Unfortunately this means that I don't think the interpolation can happen in the yaml (but needs to happen in the puppet manifest). Overall I don't think that's a problem.

2862–2866

jvm options aren't part of the elasticsearch config so you should probably move them one "level" up.

2868–2880

Some of this (at least the bits referencing hostnames) should probably be in a data/deployments/production/ file instead of here, to avoid poor defaults and a risk of "wires crossing" in other deployments.

2871

We need to check whether that's actually a proper value these days for a cluster built from scratch.

2872

The need to override that config needs to be questioned too :-)

data/deployments/staging/common.yaml
252–253

is this correct?

data/hostname/esnode1.internal.softwareheritage.org.yaml
1

If we use a listen_network/ip_for_network-based trick we don't need to (re) introduce these files.

site-modules/profile/manifests/elasticsearch.pp
5–19

Not sure the whole user management is needed at all; I suspect the Debian package creates the user and group itself already?

50–51

can probably be owned by root then (we don't need ES to have write access)

61–62

Same here, definitely should be owned by root

data/deployments/staging/common.yaml
252–253

good catch, we opened this too soon and it's not correct indeed.

259

neither is this, should be search-esnode0.internal.staging.swh.network.

data/hostname/esnode1.internal.softwareheritage.org.yaml
1

right, we'll check.

site-modules/profile/manifests/elasticsearch.pp
5–19

with vagrant, we needed this.
From scratch, we ended up having the gid being 120 instead of 119.

We need control over the uid/gid for the folder creations below.

If we want to use the elasticsearch snapshot mechanism for the backups (usually shared folders over nfs),
we also need the control over those.

50–51

ack

61–62

ack

Thanks for the feedback.

Currently taking it into account.

And trying out the elasticsearch puppet module (which was already in our puppet-environment but not used...)

@olasd We have tested to use the official elasticsearch puppet plugin (D4654).
There is several issues to use it. WDYT?

ardumont added inline comments.
data/common/common.yaml
2871

we kept it and moved it for the production as it is its current value as an extra config.
It won't be opened for the new staging cluster for now.

2872

quite, it got also moved as an extra configuration for the production (for the same reason as ^).

ardumont marked 2 inline comments as done.

Adapt according to various feedbacks:

  • move dedicated prod configuration to deployments/production/common.yaml
  • move existing production configuration options to deployments/production/common.yaml
  • Fix hostname typos
  • Avoid declaring extra configuration files for the esnodes ip configuration. Use our ip_for_network function stanza instead
  • Fix faulty and unneeded for now swh search configuration
  • ...

Status: vagrant still happily provisions the node as per configuration

Override /etc/hosts configuration for vagrant

  • staging: Add elasticsearch node
  • elasticsearch: Complete uid/gid creation
  • elasticsearch: Declare es configuration
  • elasticsearch: configure the jvm options in the dropin directory
site-modules/profile/manifests/elasticsearch.pp
5–19

We don't really need hardcoded uid/gids; we can just use the 'elasticsearch' name too.

Do you have pointers to this shared folders over nfs situation? That sounds quite awful, and using something like https://www.elastic.co/guide/en/elasticsearch/plugins/7.10/repository-azure.html would make more sense to me.

site-modules/profile/manifests/elasticsearch.pp
5–19

This is the case when the snapshots are stored on the filesystem[1].

I prefer generally to have unified uids/gid on the whole infra to have something reproducible but as it's not specified for other applicative users, it makes sense to only use the names instead of the ids.

It makes me think the elasticsearch user should flagged as a system user.

[1] https://www.elastic.co/guide/en/elasticsearch/reference/7.10/snapshots-register-repository.html#snapshots-filesystem-repository

Drop puppet uid/gid management and let the elasticsearch install deal with it.

ardumont added inline comments.
site-modules/profile/manifests/elasticsearch.pp
5–19

Not sure the whole user management is needed at all; I suspect the Debian package creates the user and group itself already?

yes, in the end, we removed that part
It's now done through the debian install step ;)

Thanks for this very nice improvement!

This revision is now accepted and ready to land.Dec 3 2020, 6:31 PM
ardumont marked an inline comment as done.

Landed in 06327419

oops, wrong button ¯\_(ツ)_/¯ (i requested review...)

fake LGTM to be able to change the status

This revision is now accepted and ready to land.Dec 4 2020, 11:47 AM