[SLL] get md RAID array to rebuild even if read errors?

Brian Lane bcl at brianlane.com
Thu Feb 14 21:14:23 PST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jeremy C. Reed wrote:
| Any ideas on how to get mdadm or md to rebuild even if read errors?
|
| We had a failed disk. We replaced it and did:
|
|   /sbin/mdadm --fail /dev/md0 /dev/sdb2
|   /sbin/mdadm --remove /dev/md0 /dev/sdb2
|   /sbin/mdadm --zero-superblock /dev/sdc2
|   /sbin/mdadm --add /dev/md0 /dev/sdc2
|

Are you sure you did this to the right drive? I just finished setting up
a softraid system and found that the kernel will order them just about
however it feels at boot time. The only way for me to tell which
/dev/sdX maps to the physical drive is to look at /sys/block/sdX/device
and see which bus it is mapped to.

| # cat /proc/mdstat
| Personalities : [raid1]
| md0 : active raid1 sdc2[2] sda2[0]
|       71577536 blocks [2/1] [U_]
|       [>....................]  recovery =  0.9% (691072/71577536)
| finish=148.6min speed=7944K/sec
| unused devices: <none>
|
|
| But it fails repeatedly (over 350 times by now) at:
|
| Feb 14 12:36:25 foo kernel: scsi0: ERROR on channel 0, id 0, lun 0, CDB:
| Read (10) 00 00 50 ec cd 00 00 80 00
| Feb 14 12:36:25 foo kernel: Info fld=0x50ece5, Current sda: sense key
| Medium Error
| Feb 14 12:36:25 foo kernel: Additional sense: Unrecovered read error
| Feb 14 12:36:25 foo kernel: end_request: I/O error, dev sda, sector
| 5303501
| Feb 14 12:36:25 foo kernel: raid1: sda: unrecoverable I/O read error for
| block 5094656
| Feb 14 12:36:25 foo kernel: md: md0: sync done.
|
| (Then it starts again ... kernel: md: syncing RAID array md0 and so on
| ...)
|
| Bad luck. Two disks bad. And now can't rebuild the newly added disk
| (sdc2).
|
| Any way to force it to continue with the syncing even if a "read error"?
|
| badblocks doesn't even see that bad block.
|
| If you have suggestions on getting SMART to mark it bad.
|
| Also I can't use debugfs with icheck to see the inode number because says
| "icheck: Filesystem not open". Any ideas? If I could find the inode
| number, then maybe I could rewrite that area and SMART will mark it as
| bad.
|
|   Jeremy C. Reed

It could also be a controller problem. Maybe it can't handle the stress
of rebuilding? Or maybe sdb is somehow interfering with proper
operation, have you physically disconnected it yet?

Does it always fail at the same block?

Brian

- --
- ---[Office 74.1F]--[Outside 36.7F]--[Server 98.5F]--[Coaster 65.2F]---
Software, Linux, Microcontrollers             http://www.brianlane.com
AIS Parser SDK                                http://www.aisparser.com
Movie Landmarks Search Engine            http://www.movielandmarks.com

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Remember Lexington Green!

iD8DBQFHtR+vIftj/pcSws0RAm7TAJ0U3u63xwo/eyRSP3IX2Xc3BUSRRgCghZ43
fY568L0PRPpo82oEBeTBKBo=
=edK1
-----END PGP SIGNATURE-----


More information about the linux-list mailing list