Page MenuHomeSoftware Heritage

Raise alert when systemd services are detected as degraded
ClosedPublic

Authored by ardumont on Aug 27 2021, 12:00 PM.

Details

Summary

Note that it seems to not always detect specific services being down. Hence why for
example a specific case for the postfix@-.service exists.

Related to T3497

Depends on D6134

Test Plan

playing with icinga within the pergamon machine (vm) and killing service (e.g
ntp), ensuring it's detected by icinga (it is).

bin/octocatalog:

bin/octocatalog-diff --octocatalog-diff-args --no-truncate-details --to staging pergamon
Found host pergamon.softwareheritage.org
...
*******************************************
+ Concat::Fragment[icinga2::object::Service::check_systemd] =>
   parameters =>
      "order": 60
      "target": "/etc/icinga2/zones.d/global-templates/services.conf"
      "content": >>>

apply Service "check_systemd" {
  import "generic-service"

  check_command = "check_systemd"
  command_endpoint = host.name
  assign where host.vars.os == "Linux"
  ignore where host.vars.noagent
}
<<<
*******************************************
+ Concat_fragment[icinga2::object::Service::check_systemd] =>
   parameters =>
      "order": 60
      "tag": "_etc_icinga2_zones.d_global-templates_services.conf"
      "target": "/etc/icinga2/zones.d/global-templates/services.conf"
      "content": >>>

apply Service "check_systemd" {
  import "generic-service"

  check_command = "check_systemd"
  command_endpoint = host.name
  assign where host.vars.os == "Linux"
  ignore where host.vars.noagent
}
<<<
+ Icinga2::Object::Service[check_systemd] =>
   parameters =>
      "apply": true
      "assign": ["host.vars.os == Linux"]
      "check_command": "check_systemd"
      "command_endpoint": "host.name"
      "ensure": "present"
      "ignore": ["host.vars.noagent"]
      "import": ["generic-service"]
      "name": "Check systemd state"
      "order": 60
      "prefix": false
      "service_name": "check_systemd"
      "target": "/etc/icinga2/zones.d/global-templates/services.conf"
      "template": false
*******************************************
+ Icinga2::Object[icinga2::object::Service::check_systemd] =>
   parameters =>
      "apply": true
      "assign": ["host.vars.os == Linux"]
      "attrs": {"check_command"=>"check_systemd", "command_endpoint"=>"host.name"}
      "attrs_list": ["display_name", "host_name", "check_command", "check_timeout", "check_interval", "check_period", "retry_interval", "max_check_attempts", "groups", "enable_notifications", "enable_active_checks", "enable_passive_checks", "enable_event_handler", "enable_flapping", "enable_perfdata", "event_command", "flapping_threshold_low", "flapping_threshold_high", "volatile", "zone", "command_endpoint", "notes", "notes_url", "action_url", "icon_image", "icon_image_alt", "vars", "Acknowledgement", "ApiBindHost", "ApiBindPort", "ApiEnvironment", "ApplicationType", "AttachDebugger", "BuildCompilerName", "BuildCompilerVersion", "BuildHostName", "Concurrency", "Critical", "Custom", "Deprecated", "Down", "DowntimeEnd", "DowntimeRemoved", "DowntimeStart", "Environment", "FlappingEnd", "FlappingStart", "HostDown", "HostUp", "IncludeConfDir", "Internal", "Json", "LocalStateDir", "LogCritical", "LogDebug", "LogInformation", "LogNotice", "LogWarning", "Math", "MaxConcurrentChecks", "ModAttrPath", "NodeName", "OK", "ObjectsPath", "PidPath", "PkgDataDir", "PlatformArchitecture", "PlatformKernel", "PlatformKernelVersion", "PlatformName", "PlatformVersion", "PrefixDir", "Problem", "Recovery", "RunAsGroup", "RunAsUser", "RunDir", "ServiceCritical", "ServiceOK", "ServiceUnknown", "ServiceWarning", "StatePath", "SysconfDir", "System", "Types", "Unknown", "Up", "UseVfork", "VarsPath", "Warning", "ZonesDir", "NodeName", "ZoneName", "TicketSalt", "PluginDir", "PluginContribDir", "ManubulonPluginDir", "name", "NodeName", "ZoneName", "TicketSalt", "PluginDir", "PluginContribDir", "ManubulonPluginDir", "name"]
      "ensure": "present"
      "ignore": ["host.vars.noagent"]
      "import": ["generic-service"]
      "object_name": "check_systemd"
      "object_type": "Service"
      "order": 60
      "prefix": false
      "target": "/etc/icinga2/zones.d/global-templates/services.conf"
      "template": false
*******************************************
*** End octocatalog-diff on pergamon.softwareheritage.org

Diff Detail

Repository
rSPSITE puppet-swh-site
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.