vsellier@esnode1 ~ % curl http://$ES_NODE/_cat/indices | grep 2021.04 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 216k 100 216k 0 0 593k 0 --:--:-- --:--:-- --:--:-- 592k vsellier@esnode1 ~ % curl http://$ES_NODE/_cat/indices | grep 2021.03 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 29 216k 29 65452 0 0 177k 0 0:00:01 --:--:-- 0:00:01 176kgreen open apache_logs-2021.03.09 mjEHDYjXSaa1dl6QIJtNbw 3 1 140766 0 150.9mb 75.7mb green open apache_logs-2021.03.07 Z-7wHChvQlCJmcTe4Y0Fdg 3 1 116692 0 125.8mb 63.1mb green open apache_logs-2021.03.08 FW2v7wpdRjORX4sl37-kxA 3 1 122472 0 130.2mb 64.8mb green open apache_logs-2021.03.05 ptPlSK7GQ1S4G0My1SBp5A 3 1 170542 0 174mb 87.1mb green open apache_logs-2021.03.06 lNJ59HICQTeH3L18nnGaZA 3 1 143244 0 149.8mb 75.1mb green open apache_logs-2021.03.03 8gSM6WiGSIOyK5fE_yTsmA 3 1 140113 0 146.6mb 73.1mb green open apache_logs-2021.03.04 -P1FBdLSRQyNh7graL9ZLA 3 1 156148 0 154.2mb 77.1mb green open apache_logs-2021.03.01 2UB-aSqHROGrXAyKW2xW-w 3 1 128750 0 135.6mb 67.7mb green open apache_logs-2021.03.02 QWJfdD0zSC2oV5u8yolh6g 3 1 135275 0 140.9mb 70.1mb green open apache_logs-2021.03.18 f0xyrZ-PSqerRGAmwhTdgg 3 1 229053 0 215.7mb 107.9mb green open apache_logs-2021.03.19 Nu5N0vfoTbui1V0tFEAqZQ 3 1 169711 0 170.6mb 85.1mb green open apache_logs-2021.03.16 qvP03ZqBQ0CkUYi82C75fQ 3 1 209167 0 216.5mb 108.6mb green open apache_logs-2021.03.17 ZTU9ZutrT3-3xgqrIKacYw 3 1 207734 0 209.6mb 105mb green open apache_logs-2021.03.14 J4tDagjFT9u2rfsZdCP-tA 3 1 140076 0 147.6mb 74mb green open apache_logs-2021.03.15 RnG4tumnSZiRHCjYTZIEzQ 3 1 143899 0 150.5mb 75.2mb green open apache_logs-2021.03.12 VGMowaonQTWD3Xg3lM4j_A 3 1 175949 0 186.8mb 94mb green open apache_logs-2021.03.13 O0uzTTyGQHKizmeVRJiy3Q 3 1 145207 0 153.2mb 76.1mb green open apache_logs-2021.03.10 66qHu3ukQ2COF_kr8z4KMg 3 1 160609 0 169.7mb 84.6mb green open apache_logs-2021.03.11 cVbw5cbsTXa7I39ZJ2F-zg 3 1 133693 0 143.5mb 71.5mb green open apache_logs-2021.03.23 f6CXx7XjR-2B-4Bvg8eUpQ 3 1 170191 0 171.2mb 85.2mb green open apache_logs-2021.03.21 eXJzFBbyS_ufktvdoeqDwg 3 1 143843 0 149.1mb 74.7mb green open apache_logs-2021.03.22 U5w0sfClQA-vUZvD64My6g 3 1 181569 0 178.4mb 88.7mb green open apache_logs-2021.03.20 bWyQCoP-QrKjOUCwuvqmiw 3 1 144383 0 149.8mb 74.7mb green open systemlogs-2021.03.20 CAs-XT5RRxqJfF8qC9GGYA 1 1 3143129 0 3.3gb 1.6gb green open systemlogs-2021.03.21 VFwHfWOORLaH9R2XkiIvBw 1 1 3189836 0 3.4gb 1.7gb green open systemlogs-2021.03.22 bVueopZxSvWd1O9RPvuYwg 1 1 3229263 0 3.5gb 1.7gb green open systemlogs-2021.03.23 k9SIf-YcQaqu7SP3dMEuwQ 1 1 3158867 0 3.4gb 1.7gb green open systemlogs-2021.03.24 Gp5UhzGDRmWqJ9NLJoAvWA 1 1 7593093 0 13.6gb 6.8gb green open systemlogs-2021.03.25 d6xV2o-2QXy47NsVbYZXRw 1 1 6982474 0 12.3gb 6.1gb green open systemlogs-2021.03.10 q4bFq3dTTfO2bQ3SO0iyfQ 1 1 3777187 0 4gb 2gb green open systemlogs-2021.03.11 xVCUjK8AQwqaDSdPikNJrw 1 1 3590211 0 3.6gb 1.8gb green open systemlogs-2021.03.12 ypxXsGpWRZ-bK0rkEetFTg 1 1 3730302 0 3.8gb 1.9gb green open systemlogs-2021.03.13 IJ1orNuvRqWHarJuRItClQ 1 1 3666800 0 3.7gb 1.8gb green open systemlogs-2021.03.14 oV-W7vKQR22MOZs2WO-MUg 1 1 3487901 0 3.5gb 1.7gb green open systemlogs-2021.03.15 lUqjZn0HQve9ayPbvOS6_Q 1 1 3518806 0 3.6gb 1.8gb green open systemlogs-2021.03.16 FcR80ceiSoCArjXB6f0suw 1 1 4058330 0 4.1gb 2.1gb green open systemlogs-2021.03.17 f_XnweknTe6Cn4v4wkqQcA 1 1 3811846 0 4gb 2gb green open systemlogs-2021.03.18 XuVleJWuQaGaeuU5XZbbXw 1 1 5573837 0 6.1gb 3gb green open systemlogs-2021.03.19 axCf06i8TBaotpDz3uAUrg 1 1 3465206 0 3.9gb 1.9gb green open systemlogs-2021.03.01 zvtCFkwSRnqmHxbZKHdxrg 1 1 4726395 0 4.3gb 2.1gb green open systemlogs-2021.03.02 GxA3m0QpSv2d0MP2eOFJVw 1 1 5211382 0 4.5gb 2.2gb green open systemlogs-2021.03.03 kbuk9KnISWWrOLAZynZfZA 1 1 4242637 0 4.1gb 2gb green open systemlogs-2021.03.04 Yki9hKlHQpqCqmb3BVav9A 1 1 3691829 0 3.8gb 1.9gb green open systemlogs-2021.03.05 suFnIqUDQ1yquIM1vOpLPg 1 1 3688620 0 3.9gb 1.9gb green open systemlogs-2021.03.06 kz1IxTO_SZGV-xYNSE_f-A 1 1 3579153 0 3.6gb 1.8gb green open systemlogs-2021.03.07 LD9iRV2wT1GwNWth2g__cQ 1 1 3514149 0 3.4gb 1.7gb green open systemlogs-2021.03.08 hisYjn1FQieZRHh5D-dTUg 1 1 3602532 0 3.6gb 1.8gb green open systemlogs-2021.03.09 dkt3CwVATP-6n6Sb239ymw 1 1 3638318 0 3.6gb 1.8gb green open swh_workers-2021.03.23 m4IY43dXTMSFRUiOVCU3lg 2 1 3795194 0 6.4gb 3.2gb green open swh_workers-2021.03.21 l0cHRyV6SCawFrItzD_BCg 2 1 4181434 0 7.1gb 3.5gb green open swh_workers-2021.03.22 MyYu8h56QOy7dBuEkiv27A 2 1 4127270 0 7gb 3.5gb green open swh_workers-2021.03.20 h6WbYEqgRt6Kn8sAxHOC8w 2 1 4106167 0 7gb 3.5gb green open swh_workers-2021.03.09 jqLE3uIlTxSMRcsMDuWOdg 2 1 4163425 0 7.1gb 3.5gb green open swh_workers-2021.03.07 O0Nti80dRf-dHYADlUryOg 2 1 4067331 0 6.9gb 3.4gb green open swh_workers-2021.03.08 d3mIQtnKQomQCVi2s4iV1A 2 1 4265681 0 7.2gb 3.6gb green open swh_workers-2021.03.05 5FZ2T58ySYyFO4aicUJlKg 2 1 3964558 0 6.7gb 3.3gb green open swh_workers-2021.03.06 p4Q1zFN8SWijfFl5ouX3Ww 2 1 4093693 0 6.9gb 3.4gb green open swh_workers-2021.03.03 ew0MDxxjTJqBLI2obn7tJQ 2 1 3864772 0 6.5gb 3.2gb green open swh_workers-2021.03.04 xfnA303lTOWifI4MayaF6g 2 1 3706768 0 6.2gb 3.1gb green open swh_workers-2021.03.01 LasAQZAoQNyd7L5ALkXMIw 2 1 3013714 0 5gb 2.5gb green open swh_workers-2021.03.02 QTDmJ-8RSSmutI6VC_CSJQ 2 1 3217369 0 5.4gb 2.6gb green open swh_workers-2021.03.18 q-8875-YQh2geX_ZC4nJrA 2 1 4878672 0 10.6gb 5.3gb green open swh_workers-2021.03.19 6H4f5EihQYeyCuM6othHTQ 2 1 4526887 0 7.6gb 3.8gb green open swh_workers-2021.03.16 mIKSkguRTJSMbWuo3TKYLA 2 1 3867886 0 6.5gb 3.2gb green open swh_workers-2021.03.17 08qRSYYdR0qOYI6f6BmPiA 2 1 3557634 0 6gb 3gb green open swh_workers-2021.03.14 zoO2Pk6hQXSbLXCGN3RPzQ 2 1 3561610 0 6gb 3gb green open swh_workers-2021.03.15 XwqwItCqSzmuUWs8RqRl9A 2 1 3717981 0 6.3gb 3.1gb green open swh_workers-2021.03.12 -iY4sIQpTkq1o-_V_yg7KQ 2 1 4407705 0 7.2gb 3.6gb green open swh_workers-2021.03.13 5SxHsfRTRRKo44NMcTfoKA 2 1 4028959 0 6.8gb 3.4gb green open swh_workers-2021.03.10 efOeR1WeTgyxd3znKzMaYQ 2 1 4058344 0 6.9gb 3.4gb green open swh_workers-2021.03.11 sILgNKA8SvaP6u86lNpZsg 2 1 3702911 0 6.3gb 3.1gb
Description
Description
Status | Assigned | Task | ||
---|---|---|---|---|
Migrated | gitlab-migration | T3219 No logs are ingested on elasticsearch since 2021-03-26 | ||
Migrated | gitlab-migration | T3221 elk: automatically limit log retention | ||
Migrated | gitlab-migration | T3222 Monitor daily indexes are present on the log cluster and logs are correctly ingested | ||
Migrated | gitlab-migration | T3223 Elasticsearch: Monitor the max opened shards on a cluster |
Event Timeline
Comment Actions
it seems we have reached a limit on the cluster (from the logstash logs) :
Apr 08 10:30:24 logstash0 logstash[1605158]: [2021-04-08T10:30:24,052][WARN ][logstash.outputs.elasticsearch][main][62d11c4234b8981da77a97955da92ac9de92b9a6dcd4582f407face31fd5c664] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"systemlogs-2021.04.08", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x2ec8df34>], :response=>{"index"=>{"_index"=>"systemlogs-2021.04.08", "_type"=>"_doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"validation_exception", "reason"=>"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [3000]/[3000] maximum shards open;"}}}} Apr 08 10:30:24 logstash0 logstash[1605158]: [2021-04-08T10:30:24,052][WARN ][logstash.outputs.elasticsearch][main][62d11c4234b8981da77a97955da92ac9de92b9a6dcd4582f407face31fd5c664] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"systemlogs-2021.04.08", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x26cfcf58>], :response=>{"index"=>{"_index"=>"systemlogs-2021.04.08", "_type"=>"_doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"validation_exception", "reason"=>"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [3000]/[3000] maximum shards open;"}}}} Apr 08 10:30:24 logstash0 logstash[1605158]: [2021-04-08T10:30:24,053][WARN ][logstash.outputs.elasticsearch][main][62d11c4234b8981da77a97955da92ac9de92b9a6dcd4582f407face31fd5c664] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"systemlogs-2021.04.08", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x1ddd876d>], :response=>{"index"=>{"_index"=>"systemlogs-2021.04.08", "_type"=>"_doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"validation_exception", "reason"=>"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [3000]/[3000] maximum shards open;"}}}}
Comment Actions
I temporary unblock the ingestion by closing the systemlogs indexes created before 2020-07-01:
curl -s http://$ES_NODE/_cat/indices\?s=index | grep -v close | grep systemlogs | awk '{print $3}' | grep 2020.05 | xargs -n1 -t -i{} curl -XPOST http://${ES_NODE}/{}/_close curl -s http://$ES_NODE/_cat/indices\?s=index | grep -v close | grep systemlogs | awk '{print $3}' | grep 2020.06 | xargs -n1 -t -i{} curl -XPOST http://${ES_NODE}/{}/_close
I'll see how to prevent that and being alerted when it appends for the next times
Comment Actions
The cluster is configured with the default value for cluster.max_shards_per_nodes [1] so it can have 3000 shards opened (1000*3)
After the index cleanup, there is still 2934 opened shards, which doesn't let a lot of space.
curl -s http://192.168.100.61:9200/_stats\?pretty | jq '._shards' { "total": 2934, "successful": 2934, "failed": 0 }
Identified actions to take:
- write a script to regularly close indexes to limit the numbers of opened shards
- Add an icinga probe that check the daily indexes are presents on the clusters or query the cluster to check if there is logs on the last 15mn(?)
- monitor the opened shards and raise an alert if it's near the [1000|cluster.max_shards_per_nodes] * number of nodes limit
Dedicated tasks will be created for each action