Page MenuHomeSoftware Heritage

No logs are ingested on elasticsearch since 2021-03-26
Closed, MigratedEdits Locked

Description

vsellier@esnode1 ~ % curl http://$ES_NODE/_cat/indices | grep 2021.04
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  216k  100  216k    0     0   593k      0 --:--:-- --:--:-- --:--:--  592k
vsellier@esnode1 ~ % curl http://$ES_NODE/_cat/indices | grep 2021.03
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 29  216k   29 65452    0     0   177k      0  0:00:01 --:--:--  0:00:01  176kgreen open  apache_logs-2021.03.09              mjEHDYjXSaa1dl6QIJtNbw 3 1   140766        0 150.9mb   75.7mb
green open  apache_logs-2021.03.07              Z-7wHChvQlCJmcTe4Y0Fdg 3 1   116692        0 125.8mb   63.1mb
green open  apache_logs-2021.03.08              FW2v7wpdRjORX4sl37-kxA 3 1   122472        0 130.2mb   64.8mb
green open  apache_logs-2021.03.05              ptPlSK7GQ1S4G0My1SBp5A 3 1   170542        0   174mb   87.1mb
green open  apache_logs-2021.03.06              lNJ59HICQTeH3L18nnGaZA 3 1   143244        0 149.8mb   75.1mb
green open  apache_logs-2021.03.03              8gSM6WiGSIOyK5fE_yTsmA 3 1   140113        0 146.6mb   73.1mb
green open  apache_logs-2021.03.04              -P1FBdLSRQyNh7graL9ZLA 3 1   156148        0 154.2mb   77.1mb
green open  apache_logs-2021.03.01              2UB-aSqHROGrXAyKW2xW-w 3 1   128750        0 135.6mb   67.7mb
green open  apache_logs-2021.03.02              QWJfdD0zSC2oV5u8yolh6g 3 1   135275        0 140.9mb   70.1mb
green open  apache_logs-2021.03.18              f0xyrZ-PSqerRGAmwhTdgg 3 1   229053        0 215.7mb  107.9mb
green open  apache_logs-2021.03.19              Nu5N0vfoTbui1V0tFEAqZQ 3 1   169711        0 170.6mb   85.1mb
green open  apache_logs-2021.03.16              qvP03ZqBQ0CkUYi82C75fQ 3 1   209167        0 216.5mb  108.6mb
green open  apache_logs-2021.03.17              ZTU9ZutrT3-3xgqrIKacYw 3 1   207734        0 209.6mb    105mb
green open  apache_logs-2021.03.14              J4tDagjFT9u2rfsZdCP-tA 3 1   140076        0 147.6mb     74mb
green open  apache_logs-2021.03.15              RnG4tumnSZiRHCjYTZIEzQ 3 1   143899        0 150.5mb   75.2mb
green open  apache_logs-2021.03.12              VGMowaonQTWD3Xg3lM4j_A 3 1   175949        0 186.8mb     94mb
green open  apache_logs-2021.03.13              O0uzTTyGQHKizmeVRJiy3Q 3 1   145207        0 153.2mb   76.1mb
green open  apache_logs-2021.03.10              66qHu3ukQ2COF_kr8z4KMg 3 1   160609        0 169.7mb   84.6mb
green open  apache_logs-2021.03.11              cVbw5cbsTXa7I39ZJ2F-zg 3 1   133693        0 143.5mb   71.5mb
green open  apache_logs-2021.03.23              f6CXx7XjR-2B-4Bvg8eUpQ 3 1   170191        0 171.2mb   85.2mb
green open  apache_logs-2021.03.21              eXJzFBbyS_ufktvdoeqDwg 3 1   143843        0 149.1mb   74.7mb
green open  apache_logs-2021.03.22              U5w0sfClQA-vUZvD64My6g 3 1   181569        0 178.4mb   88.7mb
green open  apache_logs-2021.03.20              bWyQCoP-QrKjOUCwuvqmiw 3 1   144383        0 149.8mb   74.7mb
green open  systemlogs-2021.03.20               CAs-XT5RRxqJfF8qC9GGYA 1 1  3143129        0   3.3gb    1.6gb
green open  systemlogs-2021.03.21               VFwHfWOORLaH9R2XkiIvBw 1 1  3189836        0   3.4gb    1.7gb
green open  systemlogs-2021.03.22               bVueopZxSvWd1O9RPvuYwg 1 1  3229263        0   3.5gb    1.7gb
green open  systemlogs-2021.03.23               k9SIf-YcQaqu7SP3dMEuwQ 1 1  3158867        0   3.4gb    1.7gb
green open  systemlogs-2021.03.24               Gp5UhzGDRmWqJ9NLJoAvWA 1 1  7593093        0  13.6gb    6.8gb
green open  systemlogs-2021.03.25               d6xV2o-2QXy47NsVbYZXRw 1 1  6982474        0  12.3gb    6.1gb
green open  systemlogs-2021.03.10               q4bFq3dTTfO2bQ3SO0iyfQ 1 1  3777187        0     4gb      2gb
green open  systemlogs-2021.03.11               xVCUjK8AQwqaDSdPikNJrw 1 1  3590211        0   3.6gb    1.8gb
green open  systemlogs-2021.03.12               ypxXsGpWRZ-bK0rkEetFTg 1 1  3730302        0   3.8gb    1.9gb
green open  systemlogs-2021.03.13               IJ1orNuvRqWHarJuRItClQ 1 1  3666800        0   3.7gb    1.8gb
green open  systemlogs-2021.03.14               oV-W7vKQR22MOZs2WO-MUg 1 1  3487901        0   3.5gb    1.7gb
green open  systemlogs-2021.03.15               lUqjZn0HQve9ayPbvOS6_Q 1 1  3518806        0   3.6gb    1.8gb
green open  systemlogs-2021.03.16               FcR80ceiSoCArjXB6f0suw 1 1  4058330        0   4.1gb    2.1gb
green open  systemlogs-2021.03.17               f_XnweknTe6Cn4v4wkqQcA 1 1  3811846        0     4gb      2gb
green open  systemlogs-2021.03.18               XuVleJWuQaGaeuU5XZbbXw 1 1  5573837        0   6.1gb      3gb
green open  systemlogs-2021.03.19               axCf06i8TBaotpDz3uAUrg 1 1  3465206        0   3.9gb    1.9gb
green open  systemlogs-2021.03.01               zvtCFkwSRnqmHxbZKHdxrg 1 1  4726395        0   4.3gb    2.1gb
green open  systemlogs-2021.03.02               GxA3m0QpSv2d0MP2eOFJVw 1 1  5211382        0   4.5gb    2.2gb
green open  systemlogs-2021.03.03               kbuk9KnISWWrOLAZynZfZA 1 1  4242637        0   4.1gb      2gb
green open  systemlogs-2021.03.04               Yki9hKlHQpqCqmb3BVav9A 1 1  3691829        0   3.8gb    1.9gb
green open  systemlogs-2021.03.05               suFnIqUDQ1yquIM1vOpLPg 1 1  3688620        0   3.9gb    1.9gb
green open  systemlogs-2021.03.06               kz1IxTO_SZGV-xYNSE_f-A 1 1  3579153        0   3.6gb    1.8gb
green open  systemlogs-2021.03.07               LD9iRV2wT1GwNWth2g__cQ 1 1  3514149        0   3.4gb    1.7gb
green open  systemlogs-2021.03.08               hisYjn1FQieZRHh5D-dTUg 1 1  3602532        0   3.6gb    1.8gb
green open  systemlogs-2021.03.09               dkt3CwVATP-6n6Sb239ymw 1 1  3638318        0   3.6gb    1.8gb
green open  swh_workers-2021.03.23              m4IY43dXTMSFRUiOVCU3lg 2 1  3795194        0   6.4gb    3.2gb
green open  swh_workers-2021.03.21              l0cHRyV6SCawFrItzD_BCg 2 1  4181434        0   7.1gb    3.5gb
green open  swh_workers-2021.03.22              MyYu8h56QOy7dBuEkiv27A 2 1  4127270        0     7gb    3.5gb
green open  swh_workers-2021.03.20              h6WbYEqgRt6Kn8sAxHOC8w 2 1  4106167        0     7gb    3.5gb
green open  swh_workers-2021.03.09              jqLE3uIlTxSMRcsMDuWOdg 2 1  4163425        0   7.1gb    3.5gb
green open  swh_workers-2021.03.07              O0Nti80dRf-dHYADlUryOg 2 1  4067331        0   6.9gb    3.4gb
green open  swh_workers-2021.03.08              d3mIQtnKQomQCVi2s4iV1A 2 1  4265681        0   7.2gb    3.6gb
green open  swh_workers-2021.03.05              5FZ2T58ySYyFO4aicUJlKg 2 1  3964558        0   6.7gb    3.3gb
green open  swh_workers-2021.03.06              p4Q1zFN8SWijfFl5ouX3Ww 2 1  4093693        0   6.9gb    3.4gb
green open  swh_workers-2021.03.03              ew0MDxxjTJqBLI2obn7tJQ 2 1  3864772        0   6.5gb    3.2gb
green open  swh_workers-2021.03.04              xfnA303lTOWifI4MayaF6g 2 1  3706768        0   6.2gb    3.1gb
green open  swh_workers-2021.03.01              LasAQZAoQNyd7L5ALkXMIw 2 1  3013714        0     5gb    2.5gb
green open  swh_workers-2021.03.02              QTDmJ-8RSSmutI6VC_CSJQ 2 1  3217369        0   5.4gb    2.6gb
green open  swh_workers-2021.03.18              q-8875-YQh2geX_ZC4nJrA 2 1  4878672        0  10.6gb    5.3gb
green open  swh_workers-2021.03.19              6H4f5EihQYeyCuM6othHTQ 2 1  4526887        0   7.6gb    3.8gb
green open  swh_workers-2021.03.16              mIKSkguRTJSMbWuo3TKYLA 2 1  3867886        0   6.5gb    3.2gb
green open  swh_workers-2021.03.17              08qRSYYdR0qOYI6f6BmPiA 2 1  3557634        0     6gb      3gb
green open  swh_workers-2021.03.14              zoO2Pk6hQXSbLXCGN3RPzQ 2 1  3561610        0     6gb      3gb
green open  swh_workers-2021.03.15              XwqwItCqSzmuUWs8RqRl9A 2 1  3717981        0   6.3gb    3.1gb
green open  swh_workers-2021.03.12              -iY4sIQpTkq1o-_V_yg7KQ 2 1  4407705        0   7.2gb    3.6gb
green open  swh_workers-2021.03.13              5SxHsfRTRRKo44NMcTfoKA 2 1  4028959        0   6.8gb    3.4gb
green open  swh_workers-2021.03.10              efOeR1WeTgyxd3znKzMaYQ 2 1  4058344        0   6.9gb    3.4gb
green open  swh_workers-2021.03.11              sILgNKA8SvaP6u86lNpZsg 2 1  3702911        0   6.3gb    3.1gb

Event Timeline

vsellier changed the task status from Open to Work in Progress.EditedApr 8 2021, 12:29 PM
vsellier triaged this task as Unbreak Now! priority.
vsellier created this task.

it seems we have reached a limit on the cluster (from the logstash logs) :

Apr 08 10:30:24 logstash0 logstash[1605158]: [2021-04-08T10:30:24,052][WARN ][logstash.outputs.elasticsearch][main][62d11c4234b8981da77a97955da92ac9de92b9a6dcd4582f407face31fd5c664] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"systemlogs-2021.04.08", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x2ec8df34>], :response=>{"index"=>{"_index"=>"systemlogs-2021.04.08", "_type"=>"_doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"validation_exception", "reason"=>"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [3000]/[3000] maximum shards open;"}}}}
Apr 08 10:30:24 logstash0 logstash[1605158]: [2021-04-08T10:30:24,052][WARN ][logstash.outputs.elasticsearch][main][62d11c4234b8981da77a97955da92ac9de92b9a6dcd4582f407face31fd5c664] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"systemlogs-2021.04.08", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x26cfcf58>], :response=>{"index"=>{"_index"=>"systemlogs-2021.04.08", "_type"=>"_doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"validation_exception", "reason"=>"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [3000]/[3000] maximum shards open;"}}}}
Apr 08 10:30:24 logstash0 logstash[1605158]: [2021-04-08T10:30:24,053][WARN ][logstash.outputs.elasticsearch][main][62d11c4234b8981da77a97955da92ac9de92b9a6dcd4582f407face31fd5c664] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"systemlogs-2021.04.08", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x1ddd876d>], :response=>{"index"=>{"_index"=>"systemlogs-2021.04.08", "_type"=>"_doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"validation_exception", "reason"=>"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [3000]/[3000] maximum shards open;"}}}}
vlorentz renamed this task from No logs are ingested on elasticsearch since the 2021-03-26 to No logs are ingested on elasticsearch since 2021-03-26.Apr 8 2021, 12:33 PM

I temporary unblock the ingestion by closing the systemlogs indexes created before 2020-07-01:

curl -s http://$ES_NODE/_cat/indices\?s=index | grep -v close | grep systemlogs | awk '{print $3}' | grep 2020.05 | xargs -n1 -t -i{} curl -XPOST http://${ES_NODE}/{}/_close 
curl -s http://$ES_NODE/_cat/indices\?s=index | grep -v close | grep systemlogs | awk '{print $3}' | grep 2020.06 | xargs -n1 -t -i{} curl -XPOST http://${ES_NODE}/{}/_close

I'll see how to prevent that and being alerted when it appends for the next times

The cluster is configured with the default value for cluster.max_shards_per_nodes [1] so it can have 3000 shards opened (1000*3)

[1] https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-cluster.html#cluster-shard-limit

After the index cleanup, there is still 2934 opened shards, which doesn't let a lot of space.

curl -s http://192.168.100.61:9200/_stats\?pretty | jq '._shards'
{
  "total": 2934,
  "successful": 2934,
  "failed": 0
}

Identified actions to take:

  • write a script to regularly close indexes to limit the numbers of opened shards
  • Add an icinga probe that check the daily indexes are presents on the clusters or query the cluster to check if there is logs on the last 15mn(?)
  • monitor the opened shards and raise an alert if it's near the [1000|cluster.max_shards_per_nodes] * number of nodes limit

Dedicated tasks will be created for each action