A hardware error is logged on the granet iDRAC relative an ECC threashold / CPU issue
All the alerts were raised the 2021-10-01 around 21h
```
2021-10-01 21:05:33 MEM8000 Correctable memory error logging disabled for a memory device at location DIMM_B3.
2021-10-01 21:05:33 CPU9000 An OEM diagnostic event occurred.
2021-10-01 21:05:32 CPU9000 An OEM diagnostic event occurred.
2021-10-01 21:05:31 CPU9000 An OEM diagnostic event occurred.
2021-10-01 21:05:31 CPU9000 An OEM diagnostic event occurred.
2021-10-01 21:05:30 CPU9000 An OEM diagnostic event occurred.
2021-10-01 21:05:30 CPU9000 An OEM diagnostic event occurred.
2021-10-01 21:05:29 CPU9000 An OEM diagnostic event occurred.
2021-10-01 21:05:28 CPU9000 An OEM diagnostic event occurred.
2021-10-01 21:05:28 CPU9000 An OEM diagnostic event occurred.
2021-10-01 21:05:27 CPU9000 An OEM diagnostic event occurred.
2021-10-01 21:05:26 CPU9000 An OEM diagnostic event occurred.
2021-10-01 21:05:25 CPU9000 An OEM diagnostic event occurred.
2021-10-01 21:05:24 CPU0012 Correctable Machine Check Exception detected on CPU 2.
```
According to the dell manual :
- CPU0012 [1]
```
CPU0012
Message
Correctable Machine Check Exception detected on CPU arg1 .
Arguments
arg1 = number
Detailed Description
None.
Recommended Response Action
Review System Event Log and Operating System Logs. If the issue persists, contact technical support. Refer to the product documentation to choose a convenient contact method.
Category
System Health
Subcategory
CPU = Processor
Severity
Severity 2 (Warning)
Trap/EventID
2242
LCD Message
No LCD message display defined.
Initial Default
IPMI Alert;LC Log
Server Administrator Event ID
5603
Server Administrator Trap ID
5603
```
- CPU9000 [1]
```
CPU9000
Message
An OEM diagnostic event occurred.
Detailed Description
None
Recommended Response Action
No response action is required.
Category
System Health
Subcategory
CPU = Processor
Severity
Severity 3 (Informational)
LCD Message
No LCD message display defined.
Initial Default
LC Log
Server Administrator Event ID
Not Applicable
Server Administrator Trap ID
Not Applicable
```
- MEM8000 [2]
```
MEM8000
Message
Correctable memory error logging disabled for a memory device at location arg1 .
Arguments
arg1 = location
Detailed Description
Errors are being corrected but no longer logged.
Recommended Response Action
Review system logs for memory exceptions. Re-install memory at location <location>
Category
System Health
Subcategory
MEM = Memory
Severity
Severity 1 (Critical)
Trap/EventID
2265
LCD Message
SBE log disabled on <location>. Reseat memory
Initial Default
LC Log
Server Administrator Event ID
Not Applicable
Server Administrator Trap ID
Not Applicable
```
The version of the bios of this server is `2.3.2`
According to the memory autorepair documentation[3], an PPR (Post Package Repair) is not planned if other errors are not detected.
The recommanded first action is to proceed to a reboot:
```
With BIOS 2.1.x or later, the first recommended step is to reboot/restart (without moving DIMMs to a different slot). This allows the new BIOS enhancements to run, potentially resolving (self-healing) the DIMM errors without the need to schedule any DIMM replacements.
```
It's also recommended to upgrade the bios and idrac software to improve the error detection but let this for later if the problem is still present after the reboot
[1] https://www.dell.com/support/manuals/fr-fr/dell-opnmang-sw-v8.0.1/eemi_13g-v1/cpu-event-messages?guid=guid-789ec7d2-2a52-4063-a753-c5dc51e91359&lang=en-us
[2] https://www.dell.com/support/manuals/fr-fr/dell-opnmang-sw-v8.0.1/eemi_13g-v1/mem-event-messages?guid=guid-ff360c01-4e4c-4f20-871d-1d24ced52985&lang=en-us
[3] https://www.dell.com/support/kbdoc/fr-fr/000053203/what-is-ddr4-self-healing-on-dell-poweredge-servers-with-intel-xeon-scalable-processors?lang=en