Page MenuHomeSoftware Heritage

swh services: Monitor swh-worker@.service's status
Closed, MigratedEdits Locked

Description

We should be able to monitor our ever growing list of swh services's state [1]:

  • swh-worker@swh_indexer_fossology_license.service
  • swh-worker@swh_indexer_mimetype.service
  • swh-worker@swh_indexer_origin_intrinsic_metadata.service
  • swh-worker@swh_lister.service
  • swh-worker@swh_loader_debian.service
  • swh-worker@swh_loader_deposit.service
  • swh-worker@swh_loader_git.service
  • swh-worker@swh_loader_mercurial.service
  • swh-worker@swh_loader_svn.service
  • swh-worker@swh_vault_cooker.service

Ideally, we should be able to be alerted when the service is not in its right state, e.g:

  • stopped but should be running
  • running but should be stopped
  • disabled but should be enabled
  • ...

[1] https://forge.softwareheritage.org/source/puppet-swh-site/browse/production/site-modules/profile/manifests/swh/deploy/worker/

Note:

  • The need is independent from the queue consumption detection (e.g. service running but associated queue consumption cancelled).
  • Those services are systemd ones

Event Timeline

ardumont triaged this task as Normal priority.Sep 19 2018, 3:11 PM
ardumont created this task.

here's a random list of contributed icinga checks to monitor the state of a systemd service:

not tested and i'm not entirely sure to have looked in all the right places…

Rather than having one check per service I think we can have a single check that makes sure all services are started.

There's a trivial plugin that does this:

https://salsa.debian.org/dsa-team/mirror/dsa-nagios/blob/master/dsa-nagios-checks/checks/dsa-check-systemd-services

Ah, I knew I forgot something: "disabled but should be enabled" would be covered by a puppet status check, that is "if run, would puppet apply changes to this host?"