Jan 8 2023
Oct 19 2022
Jan 5 2021
Jan 6 2020
Done (via the profile::elasticsearch puppet module).
Jul 24 2019
Removed T1017 Kafka subtask, it really has no relation to the Elasticsearch cluster being a true cluster or not.
May 13 2019
Aug 31 2018
Jul 31 2018
Jul 4 2018
All remaining non-swh-worker logs deleted from legacy logstash-* indexes.
Jun 21 2018
It seems like deleting old documents takes a heavy toll on the cluster.
So far, for every month of old logstash indexes cleaned, at least one node member started to misbehave and had to be restarted after excessive timeouts and/or other issues including constant garbage collection and disk trashing.
Even though all delete requests were previously successfully processed, non-swh-workers data remain in the legacy logstash-* indexes.
This is not an entirely unexpected behavior. It is possible resource limitations prevented the old Banco node from processing all deletion requests in a bounded time frame.
Deletion queries will be rerun index by index in this way:
curl -i -H'Content-Type: application/json' \ -XPOST "http://esnode2.internal.softwareheritage.org:9200/logstash-2018.02.31/_delete_by_query?pretty=true" -d ' { "query" : { "bool" : { "must_not" : [{ "match" : { "systemd_unit" : "swh-worker@" }}] }} }'
Jun 20 2018
The swh_workers-2018.03.07 index contained non-swh-workers documents and was cleaned this way:
curl -i -H'Content-Type: application/json' \ -XPOST "http://esnode3.internal.softwareheritage.org:9200/swh_workers-2018.03.07/_delete_by_query?pretty=true" -d ' { "query" : { "bool" : { "must_not" : [{ "match" : { "systemd_unit" : "swh-worker@" }}] }} }'