Smartmontools

Introduction
Aim of this howto is to exploit SMART technology, which just about every hard disk has, to check if it is ok or not. SMART-enabled hard disks are able to continuously monitor their own health and alert the user if any anomaly is detected, and most of them are also able to carry out specific tests for better analysis.

Installation Procedure
First of all make sure SMART is enabled in the BIOS. Here is an example BIOS:

BIOS

Some BIOSes don't have the option, and report S.M.A.R.T. as disabled, but don't worry, smartctl can enable it (see below). Carefully read the SMART instructions for your motherboard. Sometimes this option maybe intentionally hidden, as shown in this FAQ entry for Gigabyte.

Now install the package:

Finally, you have to check if your hard disk(s) support SMART:

For SATA drives:

To enable SMART on IDE drives:

To enable SMART on SATA drives:

SMART Health Status
Let's check the SMART Health Status:

If you read PASSED it's ok, but if you read FAILED you have to backup your data now: the disk has already failed or it's predicted to fail within 24 hours!

SMART Error Log
Now let's check the SMART Error Log (it's a list of errors detected by SMART during the disk's life):

If we read No Errors Logged it's ok. If there are a few errors (and they are not so recent) you don't have to worry too much. If there are a lot of errors it's better if you backup your data as soon as you can.

Reading the SMART Health Status and the SMART Error Log is not enough: you really should do some other specific tests.

SMART Testing
These tests don't interfere with the normal functioning of the disk and they can be carried out when you want. Only how to launch them and read their reports, is described here; if you want to learn more go read Monitoring Hard Disks with SMART and/or read the SMARTCTL man page.

First you should know which tests are supported by your drive:

In this way you can also know how much time each of them require.

Now let's execute the SMART Immediate Offline Test (if supported, of course):

You only have to wait (smartctl will show you how long). When it finishes, you should check the SMART Error Log again for the report.

If you need to check multiple disks, you could use a small script like this, which will dump the relevant smart logs in appropriately named files, after all the tests have completed. Run the script with the test type(s) you want as it's arguments. If you do not understand what this script does, do not use it. smart.sh Don't forget to make the script executable.

Now let's carry out the SMART Short Self Test or the SMART Extended Self Test (again, only if they are supported by your drive). They are similar, but the second one is more accurate then the first:

Then check the SMART Self Test Error Log:

Now let's execute the SMART Conveyance Self Test:

Then check the SMART Self Test Error Log again:

Automatic Monitoring
If you want to automatically monitor your drive(s) you have to configure the smartd daemon and have it launch at boot.

If you use SATA or SCSI drives, the drive devices may move around during boot, so you should not use /dev/sd? to find your drives. The kernel assigns these names as it sees fit, so there's no guarantee that /dev/sda will always refer to the same physical device.

You should use the symlinks in /dev/disk/by-id/: lrwxrwxrwx 1 root root 9 2008-02-20 17:16 scsi-SDNS0P6B00FED -> ../../sdb lrwxrwxrwx 1 root root 10 2008-02-20 17:16 scsi-SDNS0P6B00FED-part1 -> ../../sdb1 lrwxrwxrwx 1 root root 9 2008-02-20 17:16 scsi-SDNS0P6B00FTH -> ../../sda lrwxrwxrwx 1 root root 10 2008-02-20 17:16 scsi-SDNS0P6B00FTH-part1 -> ../../sda1 lrwxrwxrwx 1 root root 9 2008-02-20 17:16 scsi-S_5QHZ0BRZ -> ../../sdc lrwxrwxrwx 1 root root 10 2008-02-20 17:16 scsi-S_5QHZ0BRZ-part1 -> ../../sdc1 lrwxrwxrwx 1 root root 9 2008-02-20 17:16 scsi-S_5QH02EWC -> ../../sdf lrwxrwxrwx 1 root root 10 2008-02-20 17:16 scsi-S_5QH02EWC-part1 -> ../../sdf1 lrwxrwxrwx 1 root root 9 2008-02-20 17:16 scsi-S_5QH02EX3 -> ../../sdh lrwxrwxrwx 1 root root 10 2008-02-20 17:16 scsi-S_5QH02EX3-part1 -> ../../sdh1 lrwxrwxrwx 1 root root 9 2008-02-20 17:16 scsi-S_5QH02EYT -> ../../sdd lrwxrwxrwx 1 root root 10 2008-02-20 17:16 scsi-S_5QH02EYT-part1 -> ../../sdd1 lrwxrwxrwx 1 root root 9 2008-02-20 17:16 scsi-S_9QG4MSPC -> ../../sdg lrwxrwxrwx 1 root root 10 2008-02-20 17:16 scsi-S_9QG4MSPC-part1 -> ../../sdg1 lrwxrwxrwx 1 root root 9 2008-02-20 17:16 scsi-S_9QG56Q28 -> ../../sdi lrwxrwxrwx 1 root root 10 2008-02-20 17:16 scsi-S_9QG56Q28-part1 -> ../../sdi1 lrwxrwxrwx 1 root root 10 2008-02-20 17:16 usb-PLEXTOR_CORPORATION._PLEXTOR_USB2.0-ATA.ATAPI_Bridge_000000002BDC -> ../../scd0 lrwxrwxrwx 1 root root 9 2008-02-20 17:16 usb-ST316002_1A_200509223316-0:0 -> ../../sde lrwxrwxrwx 1 root root 10 2008-02-20 17:16 usb-ST316002_1A_200509223316-0:0-part1 -> ../../sde1 Suppose a system has 8 drives, 2 SCSI and 6 SATA, as well as a USB DVD player and a USB drive. The symlinks let you set up smard.conf so that the physical hardwired devices are addressed.

Here it's shown how to:


 * monitor a single SCSI drive (/dev/disk/by-id/scsi-SDNS0P6B00FED)
 * schedule all tests (Offline, Extended and Conveyance tests) to be launched every Friday from 11:00 to 15:00, in succession
 * execute a script if any error is detected: this script will write a detailed report and then it will shut down the computer

Smartd daemon's configuration file is /etc/smartd.conf (if it doesn't exist you have to create it). /etc/smartd.conf This is the content of the script:

/usr/local/sbin/smartd.sh Obviously, make the script executable:

The previous one is only an example. Everyone is free to fit it according to his/her own configuration-related needs and preferences. If you want to learn more you can read the man page:

To test everything you should append -M test to smartd.conf's last line and launch the daemon (note that this will shut down your machine):

If something is wrong you can check /var/log/messages:

Now remove -M test option and make smartd to be launched at boot:

Finished!

Useful Links

 * smartmontools Home Page
 * Monitoring Hard Disks with SMART (Linux Journal)
 * Bad Block HowTo
 * Smartmontools for SCSI devices
 * SMART Attributes

smartmontools