Page MenuHomeSoftware Heritage

Raise alert when postfix service is down
ClosedPublic

Authored by ardumont on Aug 25 2021, 4:56 PM.

Details

Summary

This show cases how to monitor systemd services.

Depends on D6124

Related to T3495

Test Plan

Use vagrant and icinga on pergamon/belvedere nodes to assert the behavior.

octo-diff:

$ bin/octocatalog-diff --octocatalog-diff-args --no-truncate-details --to staging pergamon
...
*******************************************
+ Concat::Fragment[icinga2::object::Service::check_postfix] =>
   parameters =>
      "order": 60
      "target": "/etc/icinga2/zones.d/global-templates/services.conf"
      "content": >>>

apply Service "check_postfix" {
  import "generic-service"

  check_command = "check_systemd"
  command_endpoint = host.name
  vars.check_systemd_unit = "postfix"
  assign where host.vars.os == "Linux"
  ignore where host.vars.noagent
}
<<<
*******************************************
+ Concat_fragment[icinga2::object::Service::check_postfix] =>
   parameters =>
      "order": 60
      "tag": "_etc_icinga2_zones.d_global-templates_services.conf"
      "target": "/etc/icinga2/zones.d/global-templates/services.conf"
      "content": >>>

apply Service "check_postfix" {
  import "generic-service"

  check_command = "check_systemd"
  command_endpoint = host.name
  vars.check_systemd_unit = "postfix"
  assign where host.vars.os == "Linux"
  ignore where host.vars.noagent
}
*******************************************
+ Icinga2::Object::Service[check_postfix] =>
   parameters =>
      "apply": true
      "assign": ["host.vars.os == Linux"]
      "check_command": "check_systemd"
      "command_endpoint": "host.name"
      "ensure": "present"
      "ignore": ["host.vars.noagent"]
      "import": ["generic-service"]
      "name": "Check postfix service"
      "order": 60
      "prefix": false
      "service_name": "check_postfix"
      "target": "/etc/icinga2/zones.d/global-templates/services.conf"
      "template": false
      "vars": {"check_systemd_unit"=>"postfix"}
*******************************************
+ Icinga2::Object[icinga2::object::Service::check_postfix] =>
   parameters =>
      "apply": true
      "assign": ["host.vars.os == Linux"]
      "attrs": {"check_command"=>"check_systemd", "command_endpoint"=>"host.name", "vars"=>{"check_systemd_unit"=>"postfix"}}
      "attrs_list": ["display_name", "host_name", "check_command", "check_timeout", "check_interval", "check_period", "retry_interval", "max_check_attempts", "groups", "enable_notifications", "enable_active_checks", "enable_passive_checks", "enable_event_handler", "enable_flapping", "enable_perfdata", "event_command", "flapping_threshold_low", "flapping_threshold_high", "volatile", "zone", "command_endpoint", "notes", "notes_url", "action_url", "icon_image", "icon_image_alt", "vars", "Acknowledgement", "ApiBindHost", "ApiBindPort", "ApiEnvironment", "ApplicationType", "AttachDebugger", "BuildCompilerName", "BuildCompilerVersion", "BuildHostName", "Concurrency", "Critical", "Custom", "Deprecated", "Down", "DowntimeEnd", "DowntimeRemoved", "DowntimeStart", "Environment", "FlappingEnd", "FlappingStart", "HostDown", "HostUp", "IncludeConfDir", "Internal", "Json", "LocalStateDir", "LogCritical", "LogDebug", "LogInformation", "LogNotice", "LogWarning", "Math", "MaxConcurrentChecks", "ModAttrPath", "NodeName", "OK", "ObjectsPath", "PidPath", "PkgDataDir", "PlatformArchitecture", "PlatformKernel", "PlatformKernelVersion", "PlatformName", "PlatformVersion", "PrefixDir", "Problem", "Recovery", "RunAsGroup", "RunAsUser", "RunDir", "ServiceCritical", "ServiceOK", "ServiceUnknown", "ServiceWarning", "StatePath", "SysconfDir", "System", "Types", "Unknown", "Up", "UseVfork", "VarsPath", "Warning", "ZonesDir", "NodeName", "ZoneName", "TicketSalt", "PluginDir", "PluginContribDir", "ManubulonPluginDir", "name", "NodeName", "ZoneName", "TicketSalt", "PluginDir", "PluginContribDir", "ManubulonPluginDir", "name"]
      "ensure": "present"
      "ignore": ["host.vars.noagent"]
      "import": ["generic-service"]
      "object_name": "check_postfix"
      "object_type": "Service"
      "order": 60
      "prefix": false
      "target": "/etc/icinga2/zones.d/global-templates/services.conf"
      "template": false
*******************************************
*** End octocatalog-diff on pergamon.softwareheritage.org

Diff Detail

Repository
rSPSITE puppet-swh-site
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ardumont created this revision.
ardumont edited the test plan for this revision. (Show Details)
ardumont edited the summary of this revision. (Show Details)
ardumont edited the test plan for this revision. (Show Details)

Use the correct postfix service to check. With this, the check actually detects crash of
the processed started by the service.

This revision is now accepted and ready to land.Aug 27 2021, 12:20 PM