Page MenuHomeSoftware Heritage

Add disk health monitoring
Closed, MigratedEdits Locked

Description

  • Activate background smartmon monitoring of disk on physical servers.
  • Ensure the alert is displayed on icinga

Event Timeline

olasd added a subscriber: olasd.

Replicating the comment from T3500:

The smartmontools emails can be missed when they're sent only once and the disk gets kicked off its array.

We should add some more sticky notifications to icinga for failed/pre-failed disks.

https://github.com/thomas-krenn/check_smart_attributes looks like a decent candidate for such a check. The main sticking point will be generating the list of devices for which we want to add the check on each machine.