Page MenuHomeSoftware Heritage

One of the system disks of beaubourg is out of order
Closed, MigratedEdits Locked

Description

One of the disks of the system raid of beaubourg is OO.

It's an hardware raid and reported by the raid card:

  • raid volume status:
megacli -LDInfo -Lall -aALL

Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :Virtual Disk 0
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 931.0 GB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 931.0 GB
State               : Degraded                       <-----------------------------
Strip Size          : 512 KB
Number Of Drives    : 2
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: Yes
LD has drives that support T10 power conditions: Yes
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No

Exit Code: 0x00
  • disk status:
root@beaubourg:~# megacli -PDList -aALL 
...
Enclosure Device ID: 32
Slot Number: 1
Drive's position: DiskGroup: 0, Span: 0, Arm: 1
Enclosure position: 1
Device Id: 1
WWN: 5000C5009772B3E0
Sequence Number: 3
Media Error Count: 38015                    <---------------------
Other Error Count: 75                           <---------------------
Predictive Failure Count: 156               <---------------------
Last Predictive Failure Event Seq Number: 6560499
PD Type: SAS

Raw Size: 931.512 GB [0x74706db0 Sectors]
Non Coerced Size: 931.012 GB [0x74606db0 Sectors]
Coerced Size: 931.0 GB [0x74600000 Sectors]
Sector Size:  512
Logical Sector Size:  512
Physical Sector Size:  512
Firmware state: Failed
Device Firmware Level: NS02
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000c5009772b3e2
SAS Address(1): 0x0
Connected Port Number: 0(path0) 
Inquiry Data: SEAGATE ST1000NX0453    NS02S470G630            
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 12.0Gb/s 
Link Speed: 12.0Gb/s 
Media Type: Hard Disk Device
Drive Temperature :27C (80.60 F)
PI Eligibility:  No 
Drive is formatted for PI information:  Yes 
PI: PI with type 2
Port-0 :
Port status: Active
Port's Linkspeed: 12.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: 12.0Gb/s 
Drive has flagged a S.M.A.R.T alert : Yes
...

Event Timeline

vsellier created this task.
vsellier renamed this task from One of the system disk of beaubourg is out of order to One of the system disks of beaubourg is out of order.Aug 10 2021, 4:32 PM
vsellier changed the task status from Open to Work in Progress.Aug 24 2021, 2:43 PM
vsellier moved this task from Backlog to in-progress on the System administration board.

An alert was sent by email the 2021-05-22 at 05:30 AM so the monitoring has well detected the issue ;) :

This message was generated by the smartd daemon running on:

   host name:  beaubourg
   DNS domain: softwareheritage.org

The following warning/error was logged by the smartd daemon:

Device: /dev/bus/0 [megaraid_disk_01], SMART Failure: DATA CHANNEL IMPENDING FAILURE GENERAL HARD DRIVE FAILURE

Device info:
[SEAGATE  ST1000NX0453     NS02], lu id: 0x5000c5009772b3e3, S/N: S470G630, 1.00 TB

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
Another message will be sent in 24 hours if the problem persists.

A replacement disk will be sent by DELL. It should be delivered the 2021-09-13 if everything is ok.
The DSI is notified of the delivery

vsellier claimed this task.
vsellier moved this task from in-progress to done on the System administration board.

The disk was received Monday and replaced Thuesday by Christophe from the DSI.
The raid card automatically launch the raid rebuild. Everything is ok now.

root@beaubourg:~#  megacli -PDList -aALL
...

Enclosure Device ID: 32
Slot Number: 1
Drive's position: DiskGroup: 0, Span: 0, Arm: 1
Enclosure position: 1
Device Id: 1
WWN: 5000C500DEEF2F8C
Sequence Number: 12
Media Error Count: 0    <---------------------- it's ok now  
Other Error Count: 0     <---------------------- it's ok now
Predictive Failure Count: 0    <---------------------- it's ok now
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 931.512 GB [0x74706db0 Sectors]
Non Coerced Size: 931.012 GB [0x74606db0 Sectors]
Coerced Size: 931.0 GB [0x74600000 Sectors]
Sector Size:  512
Logical Sector Size:  512
Physical Sector Size:  512
Firmware state: Online, Spun Up
Device Firmware Level: NT32
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000c500deef2f8e
SAS Address(1): 0x0
Connected Port Number: 0(path0) 
Inquiry Data: SEAGATE ST1000NX0473    NT32W473ESVF             <----- new serial
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 12.0Gb/s 
Link Speed: 12.0Gb/s 
Media Type: Hard Disk Device
Drive Temperature :27C (80.60 F)
PI Eligibility:  No 
Drive is formatted for PI information:  Yes 
PI: PI with type 2
Port-0 :
Port status: Active
Port's Linkspeed: 12.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: 12.0Gb/s 
Drive has flagged a S.M.A.R.T alert : No
...

Volume status:

root@beaubourg:~# megacli -LDInfo -Lall -aALL
                                     

Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :Virtual Disk 0
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 931.0 GB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 931.0 GB
State               : Optimal       <-------------------- Good
Strip Size          : 512 KB
Number Of Drives    : 2
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: Yes
LD has drives that support T10 power conditions: Yes
LD's IO profile supports MAX power savings with cached writes: No
Bad Blocks Exist: No
Is VD Cached: No



Exit Code: 0x00