It so happens that sometimes, infrastructure issues hit us. For example:
- pb running out of disk space on our vm hosting dbs (P204)
- pb connecting the rabbitmq queues (P205)
This impacted the listener which failed to do its bidding (flushing queues' tasks' states in the scheduler db).
We should investigate and make it more resilient.
Note:
For information, in the systemd service file, we already define the `Restart=always` policy.
Still, it did not prevent systemd from giving up and letting the service in failure state.