Using smartctrl to view drive errors in a 3ware RAID Array



If you have ever had to deal with failed drives in a RAID array, you will know how painful it can be. We encountered a strange error (UNCONV-DCB) on one drive in a 3ware RAID array, while the 3ware tw_cli utility is quite useful it does not provide any information other than the status of the drive.

Upon doing some research I found you can use smartctl to find out the status of a drive in a RAID array. Something like this:

smartctl -a -d 3ware,0 /dev/twa0 -T verypermissive

Where 3ware,0 is drive 0, and twa0 is 3ware array 0. This outputs something like this:

[[email protected] ~]# smartctl -a -d 3ware,1 /dev/twa0 -T verypermissive
smartctl version 5.37 [i386-redhat-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model: XXXXX
Serial Number: XXXXX
Firmware Version: XXXX
User Capacity: 500,107,862,016 bytes
Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 6
ATA Standard is: ATA/ATAPI-6 T13 1410D revision 2
Local Time is: Tue May 18 11:58:43 2010 EST
SMART support is: Available – device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 642) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 120) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 119 099 006 Pre-fail Always – 229202478
3 Spin_Up_Time 0x0003 095 095 000 Pre-fail Always – 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always – 21
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always – 2
7 Seek_Error_Rate 0x000f 076 060 030 Pre-fail Always – 82364680704
9 Power_On_Hours 0x0032 082 082 000 Old_age Always – 15937
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always – 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always – 21
184 Unknown_Attribute 0x0032 100 100 099 Old_age Always – 0
187 Unknown_Attribute 0x0032 100 100 000 Old_age Always – 0
188 Unknown_Attribute 0x0032 100 100 000 Old_age Always – 0
189 Unknown_Attribute 0x003a 100 100 000 Old_age Always – 0
190 Temperature_Celsius 0x0022 073 061 045 Old_age Always – 538378267
194 Temperature_Celsius 0x0022 027 040 000 Old_age Always – 27 (Lifetime Min/Max 0/17)
195 Hardware_ECC_Recovered 0x001a 054 025 000 Old_age Always – 229202478
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always – 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline – 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always – 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[[email protected] ~]#


 


 


Categories
Tags