Page MenuHomeSoftware Heritage

Raise alert when postfix service is down
ClosedPublic

Authored by ardumont on Aug 25 2021, 4:56 PM.

Details

Summary

This show cases how to monitor systemd services.

Depends on D6124

Related to T3495

Test Plan

Use vagrant and icinga on pergamon/belvedere nodes to assert the behavior.

octo-diff:

$ bin/octocatalog-diff --octocatalog-diff-args --no-truncate-details --to staging pergamon
...
*******************************************
+ Concat::Fragment[icinga2::object::Service::check_postfix] =>
   parameters =>
      "order": 60
      "target": "/etc/icinga2/zones.d/global-templates/services.conf"
      "content": >>>

apply Service "check_postfix" {
  import "generic-service"

  check_command = "check_systemd"
  command_endpoint = host.name
  vars.check_systemd_unit = "postfix"
  assign where host.vars.os == "Linux"
  ignore where host.vars.noagent
}
<<<
*******************************************
+ Concat_fragment[icinga2::object::Service::check_postfix] =>
   parameters =>
      "order": 60
      "tag": "_etc_icinga2_zones.d_global-templates_services.conf"
      "target": "/etc/icinga2/zones.d/global-templates/services.conf"
      "content": >>>

apply Service "check_postfix" {
  import "generic-service"

  check_command = "check_systemd"
  command_endpoint = host.name
  vars.check_systemd_unit = "postfix"
  assign where host.vars.os == "Linux"
  ignore where host.vars.noagent
}
*******************************************
+ Icinga2::Object::Service[check_postfix] =>
   parameters =>
      "apply": true
      "assign": ["host.vars.os == Linux"]
      "check_command": "check_systemd"
      "command_endpoint": "host.name"
      "ensure": "present"
      "ignore": ["host.vars.noagent"]
      "import": ["generic-service"]
      "name": "Check postfix service"
      "order": 60
      "prefix": false
      "service_name": "check_postfix"
      "target": "/etc/icinga2/zones.d/global-templates/services.conf"
      "template": false
      "vars": {"check_systemd_unit"=>"postfix"}
*******************************************
+ Icinga2::Object[icinga2::object::Service::check_postfix] =>
   parameters =>
      "apply": true
      "assign": ["host.vars.os == Linux"]
      "attrs": {"check_command"=>"check_systemd", "command_endpoint"=>"host.name", "vars"=>{"check_systemd_unit"=>"postfix"}}
      "attrs_list": ["display_name", "host_name", "check_command", "check_timeout", "check_interval", "check_period", "retry_interval", "max_check_attempts", "groups", "enable_notifications", "enable_active_checks", "enable_passive_checks", "enable_event_handler", "enable_flapping", "enable_perfdata", "event_command", "flapping_threshold_low", "flapping_threshold_high", "volatile", "zone", "command_endpoint", "notes", "notes_url", "action_url", "icon_image", "icon_image_alt", "vars", "Acknowledgement", "ApiBindHost", "ApiBindPort", "ApiEnvironment", "ApplicationType", "AttachDebugger", "BuildCompilerName", "BuildCompilerVersion", "BuildHostName", "Concurrency", "Critical", "Custom", "Deprecated", "Down", "DowntimeEnd", "DowntimeRemoved", "DowntimeStart", "Environment", "FlappingEnd", "FlappingStart", "HostDown", "HostUp", "IncludeConfDir", "Internal", "Json", "LocalStateDir", "LogCritical", "LogDebug", "LogInformation", "LogNotice", "LogWarning", "Math", "MaxConcurrentChecks", "ModAttrPath", "NodeName", "OK", "ObjectsPath", "PidPath", "PkgDataDir", "PlatformArchitecture", "PlatformKernel", "PlatformKernelVersion", "PlatformName", "PlatformVersion", "PrefixDir", "Problem", "Recovery", "RunAsGroup", "RunAsUser", "RunDir", "ServiceCritical", "ServiceOK", "ServiceUnknown", "ServiceWarning", "StatePath", "SysconfDir", "System", "Types", "Unknown", "Up", "UseVfork", "VarsPath", "Warning", "ZonesDir", "NodeName", "ZoneName", "TicketSalt", "PluginDir", "PluginContribDir", "ManubulonPluginDir", "name", "NodeName", "ZoneName", "TicketSalt", "PluginDir", "PluginContribDir", "ManubulonPluginDir", "name"]
      "ensure": "present"
      "ignore": ["host.vars.noagent"]
      "import": ["generic-service"]
      "object_name": "check_postfix"
      "object_type": "Service"
      "order": 60
      "prefix": false
      "target": "/etc/icinga2/zones.d/global-templates/services.conf"
      "template": false
*******************************************
*** End octocatalog-diff on pergamon.softwareheritage.org

Diff Detail

Repository
rSPSITE puppet-swh-site
Branch
staging
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 23162
Build 36134: arc lint + arc unit

Event Timeline

ardumont created this revision.
ardumont edited the test plan for this revision. (Show Details)
ardumont edited the summary of this revision. (Show Details)
ardumont edited the test plan for this revision. (Show Details)

Use the correct postfix service to check. With this, the check actually detects crash of
the processed started by the service.

This revision is now accepted and ready to land.Aug 27 2021, 12:20 PM