After T472 we now have a icinga2 setup on pergamon, but it's very minimal—it only monitors pergamon itself and the most relevant public facing services.
We should setup more comprehensive monitoring of all our machines, including but not limited to:
- disk space and other local resources on all machines
- main internal services deployed on all machines
- dependencies between services