Page MenuHomeSoftware Heritage

comprehensive infrastructure monitoring using icinga2
Closed, MigratedEdits Locked

Description

After T472 we now have a icinga2 setup on pergamon, but it's very minimal—it only monitors pergamon itself and the most relevant public facing services.

We should setup more comprehensive monitoring of all our machines, including but not limited to:

  • disk space and other local resources on all machines
  • main internal services deployed on all machines
  • dependencies between services

Event Timeline

zack added a project: Restricted Project.Feb 12 2017, 6:18 PM
zack moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Feb 12 2017, 6:38 PM
olasd claimed this task.
olasd added a subscriber: olasd.

We now have a somewhat good set of icinga checks, and proper infrastructure to add more through puppet.