Page MenuHomeSoftware Heritage

elasticsearch: about the elk stack policy upgrade?
Open, NormalPublic

Description

We got caught off-guard by a logstash upgrade (version was not completely pinned in our puppet manifest).

This is now fixed (D544).
We kept the latest version (6.4.2) as this fixed some issues on the indexed logs.

In general though, that begs the elasticsearch upgrade policy question.
Do we synchronize or not those version number between the elk stack (recently they started sharing the same numbering)?

This needs further digging to know if it's worth it or not to upgrade.
It's not a simple matter as the elasticsearch upgrade is apparently not easily done (manual intervention needed).

Event Timeline

ardumont created this task.Oct 17 2018, 3:36 PM
ardumont triaged this task as Normal priority.

Upgrading the Elasticsearch cluster is a somewhat delicate operation since nodes running old Elasticsearch versions can no longer store new data but it is not really difficult to handle properly.
The biggest issue could be with Kibana / Elasticsearch interactions: some old Kibana versions are known to stop displaying dashboards when talking to newer Elasticsearch servers.

Upgrading just one Elasticsearch node and waiting for the cluster to become green again usually takes ~= 20 minutes; we could easily experiment on one node and roll back if necessary.

A quick analysis of the 6.4.x family versions show they bring significant bug fixes to the table.
One particularly interesting aspect is general cluster reliability improvements when nodes leaves or come back to the cluster.

Elasticsearch, Logstash and Kibana are now released together and similar versions are sure to be compatible. It makes sense to have a global Puppet constant defining which general elk stack version to use for packages.

This doesn't preclude an incremental deployment policy: we just don't need to run the Puppet agent on all hosts at the same time.