Page MenuHomeSoftware Heritage

elasticsearch: about the elk stack policy upgrade?
Closed, MigratedEdits Locked

Description

We got caught off-guard by a logstash upgrade (version was not completely pinned in our puppet manifest).

This is now fixed (D544).
We kept the latest version (6.4.2) as this fixed some issues on the indexed logs.

In general though, that begs the elasticsearch upgrade policy question.
Do we synchronize or not those version number between the elk stack (recently they started sharing the same numbering)?

This needs further digging to know if it's worth it or not to upgrade.
It's not a simple matter as the elasticsearch upgrade is apparently not easily done (manual intervention needed).

Event Timeline

ardumont triaged this task as Normal priority.Oct 17 2018, 3:36 PM
ardumont created this task.

Upgrading the Elasticsearch cluster is a somewhat delicate operation since nodes running old Elasticsearch versions can no longer store new data but it is not really difficult to handle properly.
The biggest issue could be with Kibana / Elasticsearch interactions: some old Kibana versions are known to stop displaying dashboards when talking to newer Elasticsearch servers.

Upgrading just one Elasticsearch node and waiting for the cluster to become green again usually takes ~= 20 minutes; we could easily experiment on one node and roll back if necessary.

A quick analysis of the 6.4.x family versions show they bring significant bug fixes to the table.
One particularly interesting aspect is general cluster reliability improvements when nodes leaves or come back to the cluster.

Elasticsearch, Logstash and Kibana are now released together and similar versions are sure to be compatible. It makes sense to have a global Puppet constant defining which general elk stack version to use for packages.

This doesn't preclude an incremental deployment policy: we just don't need to run the Puppet agent on all hosts at the same time.

olasd claimed this task.
olasd added a subscriber: olasd.

The puppet manifests have been updated to hardcode the ELK stack version, and ensure they stay there with package pinning.

Rolling upgrades can be done by overriding for specific hosts, until the full upgrade is over.