There are 2 cases:
- the read command times out at kernel level (30 seconds by default),
- the drive reports its unability to read a given sector before the kernel lose patience (the case I'm interested in).
Kernel timeout
As drive access is usually going through the Linux SCSI layer, I think the timeout case is completely handled by this layer. According to this documentation, it tries the command several time after having reset the drive, then the bus, then the host, etc. If none of this works, the SCSI layer will offline the device. At this point, I think the md layer just "discovers" that one drive is gone, and mark it as missing (failed). Is this correct?
Drive reported error
Some drives can be configured to report a read error after a certain timeout is reached, thus aborting internal recovery attempts. This is called ERC (or TLER, CCTL). The disk timeout is usually configured to trigger before the OS timeout (or hw RAID controller), so that the latter knows what really happened instead of just "waiting and aborting".
My question is: how does Linux (and md) handle drive-reported read errors?
Will it try again, do something clever, or just offline the drive without going through all attempts described in "Kernel timeout" above? Is md even aware when such a thing happens?
Some people suggest that ERC is dangerous on Linux as it will not give enough time for the drive to try to recover. They also say that ZFS-raid is nice because if a read error occurs, it will compute the missing unreadable sector data thanks to RAID redundancy, and overwrite it back on the drive. The latter should then stop trying to read the nasty sector, automatically mark it as bad (not to be used anymore), and remap it on a nice sane sector.
Is md also capable of doing this?