Page MenuHomeSoftware Heritage

Handle SATA SSDs on belvedere
Closed, ResolvedPublic

Description

We've had some load issues on the SATA SSDs attached to belvedere.

  • Run full SMART tests on all disks
  • TRIM the SATA SSDs
  • Decide what to do with them.

Event Timeline

olasd triaged this task as Normal priority.Jun 9 2020, 10:51 AM
olasd created this task.
olasd changed the task status from Open to Work in Progress.Sep 8 2020, 2:45 PM
olasd updated the task description. (Show Details)

Short smart tests:

# for disk in sd{a..h}; do smartctl -t short /dev/$disk; done
[...]
# for disk in sd{a..h}; do smartctl -a /dev/$disk; done | grep Short
Short self-test routine
# 1  Short offline       Completed without error       00%     11433         -
# 3  Short offline       Completed without error       00%         2         -
Short self-test routine
# 1  Short offline       Completed without error       00%     11433         -
# 3  Short offline       Completed without error       00%         2         -
Short self-test routine
# 1  Short offline       Completed without error       00%     11433         -
# 3  Short offline       Completed without error       00%         2         -
Short self-test routine
# 1  Short offline       Completed without error       00%     11433         -
# 3  Short offline       Completed without error       00%         2         -
Short self-test routine
# 1  Short offline       Completed without error       00%     11433         -
# 3  Short offline       Completed without error       00%         2         -
Short self-test routine
# 1  Short offline       Completed without error       00%     11433         -
# 3  Short offline       Completed without error       00%         2         -
Short self-test routine
# 1  Short offline       Completed without error       00%     11433         -
# 3  Short offline       Completed without error       00%         2         -
Short self-test routine
# 1  Short offline       Completed without error       00%     11433         -
# 3  Short offline       Completed without error       00%         2         -

TRIM:

for disk in sd{a..h}; do blkdiscard /dev/$disk; done

Long smart tests:

for disk in sd{a..h}; do smartctl -t long /dev/$disk; done

(in progress...)

olasd added a comment.Sep 8 2020, 3:08 PM
root@belvedere:~# for disk in sd{a..h}; do smartctl -a /dev/$disk | grep -iA1 'self-test execution' ; done
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
root@belvedere:~# for disk in sd{a..h}; do smartctl -a /dev/$disk | grep 'Extended' ; done
Extended self-test routine
# 1  Extended offline    Completed without error       00%     11434         -
# 3  Extended offline    Completed without error       00%         2         -
Extended self-test routine
# 1  Extended offline    Completed without error       00%     11434         -
# 3  Extended offline    Completed without error       00%         2         -
Extended self-test routine
# 1  Extended offline    Completed without error       00%     11434         -
# 3  Extended offline    Completed without error       00%         2         -
Extended self-test routine
# 1  Extended offline    Completed without error       00%     11434         -
# 3  Extended offline    Completed without error       00%         2         -
Extended self-test routine
# 1  Extended offline    Completed without error       00%     11434         -
# 3  Extended offline    Completed without error       00%         2         -
Extended self-test routine
# 1  Extended offline    Completed without error       00%     11434         -
# 3  Extended offline    Completed without error       00%         2         -
Extended self-test routine
# 1  Extended offline    Completed without error       00%     11434         -
# 3  Extended offline    Completed without error       00%         2         -
Extended self-test routine
# 1  Extended offline    Completed without error       00%     11434         -
# 3  Extended offline    Completed without error       00%         2         -

looks happy enough

olasd updated the task description. (Show Details)Sep 8 2020, 3:08 PM
olasd closed this task as Resolved.Sep 8 2020, 4:49 PM
olasd claimed this task.

SATA disks added back to the main ZFS pool using the following command:

get_wwn () { ls -l /dev/disk/by-id/ | grep $1\$ | grep -v part | grep wwn | awk '{print $9}' ; }

zpool add data mirror `get_wwn sda` `get_wwn sde` mirror `get_wwn sdb` `get_wwn sdf` mirror `get_wwn sdc` `get_wwn sdg` mirror `get_wwn sdd` `get_wwn sdh`