Page MenuHomeSoftware Heritage

The weekly report bot is down
Closed, MigratedEdits Locked

Description

It did not send the last two weekly mails

Event Timeline

vlorentz created this task.

The postfix server was down. It got restarted today.

No logs were seen after the 09/08/2021 [1] which is around the last outage [2]
The server (nor the service) were restarted.

[1] http://kibana0.internal.softwareheritage.org:5601/goto/b6b81346f42425ff50071a356350b38f

[2] I opened back the august systemdlog in elasticsearch first.

index_name="systemlogs"; for i in $(seq 01 22); do ([[ $i -lt 10 ]] && num="0$i" || num="$i"); curl -X POST $ES_SERVER/$index_name-2021.08.$num/_open; curl -X POST $ES_SERVER/$index_name-2021.08.$num/_unfreeze; done

Looking into adding an icinga alert around this.

For this, the following plugins [1] [2] sound interesting.
Plus, that'd definitely help to have alerts around other systemd services as well.

[1] https://icinga.com/docs/icinga-2/latest/doc/10-icinga-template-library/#systemd

[2] https://exchange.icinga.com/kiminen/icinga2-check_systemd_service

For this, the following plugins [1] [2] sound interesting.
Plus, that'd definitely help to have alerts around other systemd services as well.

[1] https://icinga.com/docs/icinga-2/latest/doc/10-icinga-template-library/#systemd

Turns out this one ^ is already packaged for debian unstable but not backported.
I'm of a mind to just do the necessary packaging glue to be able to build it and upload
it to our swh debian repository [1]

[1] P1128

Turns out this one ^ is already packaged for debian unstable but not backported.
I'm of a mind to just do the necessary packaging glue to be able to build it and upload
it to our swh debian repository [1]

After some bumps, it's finally uploaded there [1]
package repository is mirrored on our forge [2] and the package got built from the debian/buster-swh branch.

I'll address the puppet part to actually install the icinga alert tomorrow (which will install that package).

[1]

tony@yavin4 $ swh-debian-upload monitoring-plugins-systemd_2.3.1-1~swh1~bpo10+1_amd64.changes
The .dsc file is already signed.
Would you like to use the current signature? [Yn]y  # <----- detail: already signed earlier with failing upload due to me!
 signfile monitoring-plugins-systemd_2.3.1-1~swh1~bpo10+1_amd64.changes 0D10C3B8
Successfully signed dsc and changes files
Exporting indices...

root@pergamon:/srv/softwareheritage/repository# reprepro ls monitoring-plugins-systemd
monitoring-plugins-systemd | 2.3.1-1~swh1~bpo10+1 | buster-swh | i386, amd64, source

root@pergamon:/srv/softwareheritage/repository# apt-cache search monitoring-plugins-systemd
root@pergamon:/srv/softwareheritage/repository# apt update
root@pergamon:/srv/softwareheritage/repository# apt-cache search monitoring-plugins-systemd
monitoring-plugins-systemd - systemd plugin for nagios compatible monitoring systems

[2] https://forge.softwareheritage.org/source/monitoring-plugins-systemd/

ardumont claimed this task.

Postfix alert installed so next time, we should be able to be made aware ;)
Closing now.