Given the goal of conserving swh_workers indexes for ever, we will have > 1000 indexes after 3 years.
With the default number of shards per index, this will mean more than 5400 shards.
Each shard consumes Elasticsearch resources; having so many is unreasonable.
After collecting a few days of statistics with the new index patterns, we can see that:
- Each swh_worker index only uses 6-7GB maximum, way below the recommended maximum shard size of 40-50GB.
- Some swh_worker indexes contain more than 6 million documents, a number which could still grow in the future.
It is recommanded not to have more more than 3-5 million documents per shard.
Given all the above data, having two shards per swh_worker index reasonable.
The following shell code adds a template to the Elasticsearch cluster and sets the number of shards per swh_worker index to 2:
curl -i -H'Content-Type: application/json' -XPUT http://localhost:9200/_template/template_swh_workers -d ' { "template" : "swh_workers-*", "settings" : { "number_of_shards" : 2 } }'