Some time ago I had a RAID5 system at home. One of the 4 disks failed but after removing and putting it back it seemed to be OK so I started a resync. When it finished I realized, to my horror, that 3 out of 4 disks failed. However I don't belive that's possible. There are multiple partitions on the disks each part of a different RAID array.
- md0 is a RAID1 array comprised of sda1, sdb1, sdc1 and sdd1.
- md1 is a RAID5 array comprised of sda2, sdb2, sdc2 and sdd2.
- md2 is a RAID0 array comprised of sda3, sdb3, sdc3 and sdd3.
md0 and md2 reports all disks up while md1 reports 3 failed (sdb2, sdc2, sdd2). It's my uderstanding that when hard drives fail all the partitions should be lost not just the middle ones.
At that point I turned the computer off and unplugged the drives. Since then I was using that computer with a smaller new disk.
Is there any hope of recovering the data? Can I somehow convince mdadm that my disks are in fact working? The only disk that may really have a problem is sdc but that one too is reported up by the other arrays.
Update
I finally got a chance to connect the old disks and boot this machine from SystemRescueCd. Everything above was written from memory. Now I have some hard data. Here is the output of mdadm --examine /dev/sd*2
/dev/sda2:
Magic : a92b4efc
Version : 0.90.00
UUID : 53eb7711:5b290125:db4a62ac:7770c5ea
Creation Time : Sun May 30 21:48:55 2010
Raid Level : raid5
Used Dev Size : 625064960 (596.11 GiB 640.07 GB)
Array Size : 1875194880 (1788.33 GiB 1920.20 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 1
Update Time : Mon Aug 23 11:40:48 2010
State : clean
Active Devices : 3
Working Devices : 4
Failed Devices : 1
Spare Devices : 1
Checksum : 68b48835 - correct
Events : 53204
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 0 8 2 0 active sync /dev/sda2
0 0 8 2 0 active sync /dev/sda2
1 1 8 18 1 active sync /dev/sdb2
2 2 8 34 2 active sync /dev/sdc2
3 3 0 0 3 faulty removed
4 4 8 50 4 spare /dev/sdd2
/dev/sdb2:
Magic : a92b4efc
Version : 0.90.00
UUID : 53eb7711:5b290125:db4a62ac:7770c5ea
Creation Time : Sun May 30 21:48:55 2010
Raid Level : raid5
Used Dev Size : 625064960 (596.11 GiB 640.07 GB)
Array Size : 1875194880 (1788.33 GiB 1920.20 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 1
Update Time : Mon Aug 23 11:44:54 2010
State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 1
Spare Devices : 1
Checksum : 68b4894a - correct
Events : 53205
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 1 8 18 1 active sync /dev/sdb2
0 0 0 0 0 removed
1 1 8 18 1 active sync /dev/sdb2
2 2 8 34 2 active sync /dev/sdc2
3 3 0 0 3 faulty removed
4 4 8 50 4 spare /dev/sdd2
/dev/sdc2:
Magic : a92b4efc
Version : 0.90.00
UUID : 53eb7711:5b290125:db4a62ac:7770c5ea
Creation Time : Sun May 30 21:48:55 2010
Raid Level : raid5
Used Dev Size : 625064960 (596.11 GiB 640.07 GB)
Array Size : 1875194880 (1788.33 GiB 1920.20 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 1
Update Time : Mon Aug 23 11:44:54 2010
State : clean
Active Devices : 1
Working Devices : 2
Failed Devices : 2
Spare Devices : 1
Checksum : 68b48975 - correct
Events : 53210
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 2 8 34 2 active sync /dev/sdc2
0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 8 34 2 active sync /dev/sdc2
3 3 0 0 3 faulty removed
4 4 8 50 4 spare /dev/sdd2
/dev/sdd2:
Magic : a92b4efc
Version : 0.90.00
UUID : 53eb7711:5b290125:db4a62ac:7770c5ea
Creation Time : Sun May 30 21:48:55 2010
Raid Level : raid5
Used Dev Size : 625064960 (596.11 GiB 640.07 GB)
Array Size : 1875194880 (1788.33 GiB 1920.20 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 1
Update Time : Mon Aug 23 11:44:54 2010
State : clean
Active Devices : 1
Working Devices : 2
Failed Devices : 2
Spare Devices : 1
Checksum : 68b48983 - correct
Events : 53210
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 4 8 50 4 spare /dev/sdd2
0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 8 34 2 active sync /dev/sdc2
3 3 0 0 3 faulty removed
4 4 8 50 4 spare /dev/sdd2
It appears that things have changed since the last boot. If I'm reading this correctly sda2, sdb2 and sdc2 are working and contain synchronized data and sdd2 is spare. I distinctly remember seeing 3 failed disks but this is good news. Yet the array still isn't working:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md125 : inactive sda2[0](S) sdb2[1](S) sdc2[2](S)
1875194880 blocks
md126 : inactive sdd2[4](S)
625064960 blocks
md127 : active raid1 sda1[0] sdd1[3] sdc1[2] sdb1[1]
64128 blocks [4/4] [UUUU]
unused devices: <none>
md0 appears to be renamed to md127. md125 and md126 are very strange. They should be one array not two. That used to be called md1. md2 is completely gone but that was my swap so I don't care.
I can understand the different names and it doesn't really matter. But why is an array with 3 "active sync" disks unreadable? And what's up with sdd2 being in a separate array?
Update
I tried the following after backing up the superblocks:
root@sysresccd /root % mdadm --stop /dev/md125
mdadm: stopped /dev/md125
root@sysresccd /root % mdadm --stop /dev/md126
mdadm: stopped /dev/md126
So far so good. Since sdd2 is spare I don't want to add it yet.
root@sysresccd /root % mdadm --assemble /dev/md1 /dev/sd{a,b,c}2 missing
mdadm: cannot open device missing: No such file or directory
mdadm: missing has no superblock - assembly aborted
Apparently I can't do that.
root@sysresccd /root % mdadm --assemble /dev/md1 /dev/sd{a,b,c}2
mdadm: /dev/md1 assembled from 1 drive - not enough to start the array.
root@sysresccd /root % cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : inactive sdc2[2](S) sdb2[1](S) sda2[0](S)
1875194880 blocks
md127 : active raid1 sda1[0] sdd1[3] sdc1[2] sdb1[1]
64128 blocks [4/4] [UUUU]
unused devices: <none>
That didn't work either. Let's try with all the disks.
mdadm --stop /dev/md1
mdadm: stopped /dev/md1
root@sysresccd /root % mdadm --assemble /dev/md1 /dev/sd{a,b,c,d}2
mdadm: /dev/md1 assembled from 1 drive and 1 spare - not enough to start the array.
root@sysresccd /root % cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : inactive sdc2[2](S) sdd2[4](S) sdb2[1](S) sda2[0](S)
2500259840 blocks
md127 : active raid1 sda1[0] sdd1[3] sdc1[2] sdb1[1]
64128 blocks [4/4] [UUUU]
unused devices: <none>
No luck. Based on this answer I'm planning to try:
mdadm --create /dev/md1 --assume-clean --metadata=0.90 --bitmap=/root/bitmapfile --level=5 --raid-devices=4 /dev/sd{a,b,c}2 missing
mdadm --add /dev/md1 /dev/sdd2
Is it safe?
Update
I publish the superblock parser script I used to make that table in the my comment. Maybe someone will find it useful. Thanks for all your help.
mdadm --re-addisn't what you're looking for. Did you do a memory test recently? Do you have any log message related to the array failure?mdadm -A /dev/md1 /dev/sd{b,c,d}2(perhaps--force)? (If you haven't, back up the superblocks first.)/dev/sdd2can be in a separate array despite having the same UUID assd{a,b,c}2.