1. Home
2. Questions
3. Unanswered
4. AI Assist Labs
5. Tags
7. Chat
8. Users
10. Companies
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Stack Internal
Bring the best of human thought and AI automation together at your work. Learn more

Return to Answer

Turns out there's a built-in way of forcing a check of the array, which should be useful for finding bad disks.

Source Link

edited Apr 23, 2013 at 21:22

stharward

491
4
5

Other than using SMARTYou can force a check of the entire array while it's online. For example, to check the array on /dev/md0, run as root:

echo check > /sys/block/md0/md/sync_action

I also have a cron job that runs the following command once a month:

tar c /dir/of/raid/filesystem > /dev/null

It’s not a thorough check of the drive itself, but it does force the system to periodically verify that (almost) every file can be read successfully off the disk. Yes, some files are going to be read out of memory cache instead of disk. But I figure that if the file is in memory cache, then it’s successfully been read off disk recently, or is about to be written to disk, and either of those operations will also uncover drive errors. Anyway, running this job tests the most important criterion of a RAID array (“Can I successfully read my data?”) and in the three years I’ve been running my array, the one time I had a drive go bad, it was this command that discovered it.

One little warning is that if your filesystem is big, then this command is going to take a long time; my system takes about 6hr/TiB. I run it using ionice so that the rest of the system doesn’t grind to a halt during the drive check:

ionice -c3 tar c /dir/of/raid/filesystem > /dev/null

Other than using SMART, I have a cron job that runs the following command once a month:

tar c /dir/of/raid/filesystem > /dev/null

It’s not a thorough check of the drive itself, but it does force the system to periodically verify that (almost) every file can be read successfully off the disk. Yes, some files are going to be read out of memory cache instead of disk. But I figure that if the file is in memory cache, then it’s successfully been read off disk recently, or is about to be written to disk, and either of those operations will also uncover drive errors. Anyway, running this job tests the most important criterion of a RAID array (“Can I successfully read my data?”) and in the three years I’ve been running my array, the one time I had a drive go bad, it was this command that discovered it.

One little warning is that if your filesystem is big, then this command is going to take a long time; my system takes about 6hr/TiB. I run it using ionice so that the rest of the system doesn’t grind to a halt during the drive check:

ionice -c3 tar c /dir/of/raid/filesystem > /dev/null

You can force a check of the entire array while it's online. For example, to check the array on /dev/md0, run as root:

echo check > /sys/block/md0/md/sync_action

I also have a cron job that runs the following command once a month:

tar c /dir/of/raid/filesystem > /dev/null

It’s not a thorough check of the drive itself, but it does force the system to periodically verify that (almost) every file can be read successfully off the disk. Yes, some files are going to be read out of memory cache instead of disk. But I figure that if the file is in memory cache, then it’s successfully been read off disk recently, or is about to be written to disk, and either of those operations will also uncover drive errors. Anyway, running this job tests the most important criterion of a RAID array (“Can I successfully read my data?”) and in the three years I’ve been running my array, the one time I had a drive go bad, it was this command that discovered it.

One little warning is that if your filesystem is big, then this command is going to take a long time; my system takes about 6hr/TiB. I run it using ionice so that the rest of the system doesn’t grind to a halt during the drive check:

ionice -c3 tar c /dir/of/raid/filesystem > /dev/null

Source Link

answered Nov 30, 2012 at 16:12

stharward

491
4
5

Other than using SMART, I have a cron job that runs the following command once a month:

tar c /dir/of/raid/filesystem > /dev/null

It’s not a thorough check of the drive itself, but it does force the system to periodically verify that (almost) every file can be read successfully off the disk. Yes, some files are going to be read out of memory cache instead of disk. But I figure that if the file is in memory cache, then it’s successfully been read off disk recently, or is about to be written to disk, and either of those operations will also uncover drive errors. Anyway, running this job tests the most important criterion of a RAID array (“Can I successfully read my data?”) and in the three years I’ve been running my array, the one time I had a drive go bad, it was this command that discovered it.

One little warning is that if your filesystem is big, then this command is going to take a long time; my system takes about 6hr/TiB. I run it using ionice so that the rest of the system doesn’t grind to a halt during the drive check:

ionice -c3 tar c /dir/of/raid/filesystem > /dev/null