Skip to main content
Turns out there's a built-in way of forcing a check of the array, which should be useful for finding bad disks.
Source Link
stharward
  • 491
  • 4
  • 5

Other than using SMARTYou can force a check of the entire array while it's online. For example, to check the array on /dev/md0, run as root:

echo check > /sys/block/md0/md/sync_action

I also have a cron job that runs the following command once a month:

tar c /dir/of/raid/filesystem > /dev/null

It’s not a thorough check of the drive itself, but it does force the system to periodically verify that (almost) every file can be read successfully off the disk. Yes, some files are going to be read out of memory cache instead of disk. But I figure that if the file is in memory cache, then it’s successfully been read off disk recently, or is about to be written to disk, and either of those operations will also uncover drive errors. Anyway, running this job tests the most important criterion of a RAID array (“Can I successfully read my data?”) and in the three years I’ve been running my array, the one time I had a drive go bad, it was this command that discovered it.

One little warning is that if your filesystem is big, then this command is going to take a long time; my system takes about 6hr/TiB. I run it using ionice so that the rest of the system doesn’t grind to a halt during the drive check:

ionice -c3 tar c /dir/of/raid/filesystem > /dev/null

Other than using SMART, I have a cron job that runs the following command once a month:

tar c /dir/of/raid/filesystem > /dev/null

It’s not a thorough check of the drive itself, but it does force the system to periodically verify that (almost) every file can be read successfully off the disk. Yes, some files are going to be read out of memory cache instead of disk. But I figure that if the file is in memory cache, then it’s successfully been read off disk recently, or is about to be written to disk, and either of those operations will also uncover drive errors. Anyway, running this job tests the most important criterion of a RAID array (“Can I successfully read my data?”) and in the three years I’ve been running my array, the one time I had a drive go bad, it was this command that discovered it.

One little warning is that if your filesystem is big, then this command is going to take a long time; my system takes about 6hr/TiB. I run it using ionice so that the rest of the system doesn’t grind to a halt during the drive check:

ionice -c3 tar c /dir/of/raid/filesystem > /dev/null

You can force a check of the entire array while it's online. For example, to check the array on /dev/md0, run as root:

echo check > /sys/block/md0/md/sync_action

I also have a cron job that runs the following command once a month:

tar c /dir/of/raid/filesystem > /dev/null

It’s not a thorough check of the drive itself, but it does force the system to periodically verify that (almost) every file can be read successfully off the disk. Yes, some files are going to be read out of memory cache instead of disk. But I figure that if the file is in memory cache, then it’s successfully been read off disk recently, or is about to be written to disk, and either of those operations will also uncover drive errors. Anyway, running this job tests the most important criterion of a RAID array (“Can I successfully read my data?”) and in the three years I’ve been running my array, the one time I had a drive go bad, it was this command that discovered it.

One little warning is that if your filesystem is big, then this command is going to take a long time; my system takes about 6hr/TiB. I run it using ionice so that the rest of the system doesn’t grind to a halt during the drive check:

ionice -c3 tar c /dir/of/raid/filesystem > /dev/null
Source Link
stharward
  • 491
  • 4
  • 5

Other than using SMART, I have a cron job that runs the following command once a month:

tar c /dir/of/raid/filesystem > /dev/null

It’s not a thorough check of the drive itself, but it does force the system to periodically verify that (almost) every file can be read successfully off the disk. Yes, some files are going to be read out of memory cache instead of disk. But I figure that if the file is in memory cache, then it’s successfully been read off disk recently, or is about to be written to disk, and either of those operations will also uncover drive errors. Anyway, running this job tests the most important criterion of a RAID array (“Can I successfully read my data?”) and in the three years I’ve been running my array, the one time I had a drive go bad, it was this command that discovered it.

One little warning is that if your filesystem is big, then this command is going to take a long time; my system takes about 6hr/TiB. I run it using ionice so that the rest of the system doesn’t grind to a halt during the drive check:

ionice -c3 tar c /dir/of/raid/filesystem > /dev/null