How does Linux md-RAID handle disk read errors?

Question

There are 2 cases:

the read command times out at kernel level (30 seconds by default),
the drive reports its unability to read a given sector before the kernel lose patience (the case I'm interested in).

Kernel timeout

As drive access is usually going through the Linux SCSI layer, I think the timeout case is completely handled by this layer. According to this documentation, it tries the command several time after having reset the drive, then the bus, then the host, etc. If none of this works, the SCSI layer will offline the device. At this point, I think the md layer just "discovers" that one drive is gone, and mark it as missing (failed). Is this correct?

Drive reported error

Some drives can be configured to report a read error after a certain timeout is reached, thus aborting internal recovery attempts. This is called ERC (or TLER, CCTL). The disk timeout is usually configured to trigger before the OS timeout (or hw RAID controller), so that the latter knows what really happened instead of just "waiting and aborting".

My question is: how does Linux (and md) handle drive-reported read errors?

Will it try again, do something clever, or just offline the drive without going through all attempts described in "Kernel timeout" above? Is md even aware when such a thing happens?

Some people suggest that ERC is dangerous on Linux as it will not give enough time for the drive to try to recover. They also say that ZFS-raid is nice because if a read error occurs, it will compute the missing unreadable sector data thanks to RAID redundancy, and overwrite it back on the drive. The latter should then stop trying to read the nasty sector, automatically mark it as bad (not to be used anymore), and remap it on a nice sane sector.

Is md also capable of doing this?

There was a really good article on Ars Technica the other day that explains a lot of what you're asking: Bitrot and atomic COWs. — bahamat
– bahamat, Commented Jan 17, 2014 at 8:53

frostschutz · Accepted Answer · 2013-11-06 12:20:29Z

This is described in some detail in the md(4) man page, section RECOVERY.

[...] a read-error will instead cause md to attempt a recovery by overwriting the bad block. i.e. it will find the correct data from elsewhere, write it over the block that failed, and then try to read it back again. If either the write or the re-read fail, md will treat the error the same way that a write error is treated, and will fail the whole device.

As for timeouts, while there are reports of drives getting kicked out if they were in standby, it's never actually happened for me. I have 7 HDDs which usually spin down (as the main system runs off SSD and can get by without HDD access for long periods of time) and it works without a problem (except that md wakes one drive after the other instead of all-at-once).

I guess it depends on what the other layers report to md.

ok, "attempt a recovery by overwriting the bad block", but what if the iSCSI layer turns off the drive first? I mean, md needs to know before it happens that a read error occured to even try to "overwrite the bad block". — Totor
– Totor, Commented Mar 28, 2015 at 0:20

psusi · Accepted Answer · 2014-03-26 18:58:33Z

0

As for TLER being dangerous, I think not. In fact, I still can't believe that this feature ever even was needed; no drive should take that long retrying. Even just 7 seconds is enough for even a slow 5400 rpm drive to retry several hundred times. If you can't get it right after several, let alone several hundred tries, then you aren't ever going to.

answered Mar 26, 2014 at 18:58

psusi

17.7k3 gold badges43 silver badges54 bronze badges

Sure, but if the OS triggers the timeout, it doesn't know what the problem was, so it cannot really do something about it. ERC has the advantage of being clear about the problem so the system can restore the data from the other disks (parity) and overwrite the sector with the original data.

Totor
– Totor

2014-03-29 21:50:54 +00:00
Commented Mar 29, 2014 at 21:50
1

@Totor, md behaves exactly the same way whether it acknowledged an uncorrectable error, or timed out.

psusi
– psusi

2014-03-30 00:03:42 +00:00
Commented Mar 30, 2014 at 0:03

Add a comment |

Stack Exchange Network

How does Linux md-RAID handle disk read errors?

Kernel timeout

Drive reported error

2 Answers 2

You must log in to answer this question.

Hot Network Questions

How does Linux md-RAID handle disk read errors?

Kernel timeout

Drive reported error

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions