- Activate background smartmon monitoring of disk on physical servers.
- Ensure the alert is displayed on icinga
Description
Description
Status | Assigned | Task | ||
---|---|---|---|---|
Migrated | gitlab-migration | T2852 Take back control on elasticsearch puppet manifests | ||
Migrated | gitlab-migration | T2888 Elasticsearch cluster failure during a rolling restart | ||
Migrated | gitlab-migration | T2960 Add disk health monitoring |
Event Timeline
Comment Actions
Replicating the comment from T3500:
The smartmontools emails can be missed when they're sent only once and the disk gets kicked off its array.
We should add some more sticky notifications to icinga for failed/pre-failed disks.
https://github.com/thomas-krenn/check_smart_attributes looks like a decent candidate for such a check. The main sticking point will be generating the list of devices for which we want to add the check on each machine.