Page MenuHomeSoftware Heritage

System administration (Elasticsearch consolidation (W24/2018))Milestone
ActivePublic

Members

  • This project does not have any members.

Watchers

  • This project does not have any watchers.

Recent Activity

May 13 2019

ftigeot removed a parent task for T792: Make the elasticsearch logging cluster actually a cluster: T986: Scheduler: Automate completed oneshot or disabled recurring tasks archival.
May 13 2019, 4:29 PM · System administration (Elasticsearch consolidation (W24/2018))
ftigeot removed a parent task for T792: Make the elasticsearch logging cluster actually a cluster: T1005: webapp: Push logs to elasticsearch cluster.
May 13 2019, 4:28 PM · System administration (Elasticsearch consolidation (W24/2018))
ftigeot removed a parent task for T792: Make the elasticsearch logging cluster actually a cluster: T1028: deposit: Push logs to elasticsearch.
May 13 2019, 4:26 PM · System administration (Elasticsearch consolidation (W24/2018))

Aug 31 2018

ftigeot closed T1000: Reindex old data on banco to put it into swh_worker indexes, a subtask of T792: Make the elasticsearch logging cluster actually a cluster, as Resolved.
Aug 31 2018, 5:37 PM · System administration (Elasticsearch consolidation (W24/2018))

Jul 31 2018

ftigeot changed the status of T792: Make the elasticsearch logging cluster actually a cluster from Open to Work in Progress.
Jul 31 2018, 4:13 PM · System administration (Elasticsearch consolidation (W24/2018))

Jul 4 2018

ftigeot closed T977: Delete old system log data from the Elasticsearch cluster as Resolved.

All remaining non-swh-worker logs deleted from legacy logstash-* indexes.

Jul 4 2018, 11:22 AM · System administration (Elasticsearch consolidation (W24/2018))
ftigeot closed T977: Delete old system log data from the Elasticsearch cluster, a subtask of T792: Make the elasticsearch logging cluster actually a cluster, as Resolved.
Jul 4 2018, 11:22 AM · System administration (Elasticsearch consolidation (W24/2018))

Jun 21 2018

ftigeot added a comment to T977: Delete old system log data from the Elasticsearch cluster.

It seems like deleting old documents takes a heavy toll on the cluster.
So far, for every month of old logstash indexes cleaned, at least one node member started to misbehave and had to be restarted after excessive timeouts and/or other issues including constant garbage collection and disk trashing.

Jun 21 2018, 4:32 PM · System administration (Elasticsearch consolidation (W24/2018))
ftigeot added a comment to T977: Delete old system log data from the Elasticsearch cluster.

Even though all delete requests were previously successfully processed, non-swh-workers data remain in the legacy logstash-* indexes.
This is not an entirely unexpected behavior. It is possible resource limitations prevented the old Banco node from processing all deletion requests in a bounded time frame.
Deletion queries will be rerun index by index in this way:

 curl -i -H'Content-Type: application/json' \
     -XPOST "http://esnode2.internal.softwareheritage.org:9200/logstash-2018.02.31/_delete_by_query?pretty=true" -d '
{
	"query" : { "bool" : { "must_not" : [{ "match" : { "systemd_unit" : "swh-worker@" }}] }}
}'
Jun 21 2018, 11:27 AM · System administration (Elasticsearch consolidation (W24/2018))

Jun 20 2018

ftigeot added a comment to T977: Delete old system log data from the Elasticsearch cluster.

The swh_workers-2018.03.07 index contained non-swh-workers documents and was cleaned this way:

curl -i -H'Content-Type: application/json' \
     -XPOST "http://esnode3.internal.softwareheritage.org:9200/swh_workers-2018.03.07/_delete_by_query?pretty=true" -d '
{
	"query" : { "bool" : { "must_not" : [{ "match" : { "systemd_unit" : "swh-worker@" }}] }}
}'
Jun 20 2018, 11:48 AM · System administration (Elasticsearch consolidation (W24/2018))
ftigeot changed the status of T1000: Reindex old data on banco to put it into swh_worker indexes, a subtask of T792: Make the elasticsearch logging cluster actually a cluster, from Open to Work in Progress.
Jun 20 2018, 11:14 AM · System administration (Elasticsearch consolidation (W24/2018))
ftigeot claimed T792: Make the elasticsearch logging cluster actually a cluster.
Jun 20 2018, 10:43 AM · System administration (Elasticsearch consolidation (W24/2018))

Jun 13 2018

ftigeot changed the status of T793: Move elasticsearch log cluster configuration inside puppet, a subtask of T792: Make the elasticsearch logging cluster actually a cluster, from Open to Work in Progress.
Jun 13 2018, 5:04 PM · System administration (Elasticsearch consolidation (W24/2018))
ftigeot changed the status of T793: Move elasticsearch log cluster configuration inside puppet from Open to Work in Progress.
Jun 13 2018, 5:04 PM · System administration (Elasticsearch consolidation (W24/2018))
olasd edited projects for T792: Make the elasticsearch logging cluster actually a cluster, added: System administration (Elasticsearch consolidation (W24/2018)); removed System administration.
Jun 13 2018, 4:54 PM · System administration (Elasticsearch consolidation (W24/2018))
olasd edited projects for T977: Delete old system log data from the Elasticsearch cluster, added: System administration (Elasticsearch consolidation (W24/2018)); removed System administration.
Jun 13 2018, 4:54 PM · System administration (Elasticsearch consolidation (W24/2018))
olasd edited projects for T793: Move elasticsearch log cluster configuration inside puppet, added: System administration (Elasticsearch consolidation (W24/2018)); removed System administration.
Jun 13 2018, 4:54 PM · System administration (Elasticsearch consolidation (W24/2018))
olasd created System administration (Elasticsearch consolidation (W24/2018)).
Jun 13 2018, 4:53 PM