Page MenuHomeSoftware Heritage

Monitor daily indexes are present on the log cluster and logs are correctly ingested
Started, Work in Progress, NormalPublic

Description

Raise an alert if the daily indexes were not created on the elasticsearch cluster or if there are no logs in the last XXmn

Event Timeline

vsellier triaged this task as Normal priority.Thu, Apr 8, 4:32 PM
vsellier created this task.
vsellier changed the task status from Open to Work in Progress.Fri, Apr 23, 4:09 PM
vsellier claimed this task.
vsellier edited projects, added System administration; removed System administrators.
vsellier moved this task from Backlog to in-progress on the System administration board.
vsellier removed a subscriber: vsellier.

I checked the icinga_logstash plugin[1] to see if it can be helpful but it's more oriented to logastash instances used to ingest data from log files. There is no options to check the number of events received/sent for example.

[1] https://exchange.icinga.com/twidhalm/check_logstash

logstash now exposes an api server[1] which seems to return some interesting metrics on the plugin behaviors.
For example, there is a section for the elasticsearch output plugin:

  "outputs": [
    {
      "id": "62d11c4234b8981da77a97955da92ac9de92b9a6dcd4582f407face31fd5c664",
      "events": {
        "duration_in_millis": 160089636,
        "in": 72818126,
        "out": 72818046
      },
      "bulk_requests": {
        "responses": {
          "200": 3860888
        },
        "successes": 3860888
      },
      "documents": {
        "successes": 72818046
      },
      "name": "elasticsearch"
    }
  ]
},

I'll try to implement a small python script checking if there is other response code than 200 in a first time to identify the behavior
Perhaps it will be also interesting to check other properties like queue size :

"queue": {
  "type": "memory",
  "events_count": 0,
  "queue_size_in_bytes": 0,
  "max_queue_size_in_bytes": 0
},