Software RAID Install

See RAID/Software for a more complete and better formatted version of this page

Introduction
This article is about installing Gentoo on a software RAID. It describes the necessary steps to perform an installation which utilizes an on-board software RAID controller; common in many modern mainboards. Gentoo itself also provides an outstanding guide to installing Gentoo on a software raid using LVM2; you may find it easier or more difficult to follow: [1]

RAID in General
If you have never heard of a RAID, let's introduce the concept. RAID stands for redundant array of independent/inexpensive disks (independent and inexpensive are interchangeable). This means that your data isn't stored on a single disk but across multiple independent disks. The method of organization is defined by the RAID level. The most common RAID levels are RAID 0, RAID 1 and RAID 10.

RAID 0 (often called striping) appends two or more hard drives together into one large virtual drive, but does not provide redundancy. For example, two 160GB drives act as a single 320GB drive. The work is balanced between the two drives so reading/writing may be faster. The downside is that if one disk fails, all data is lost.

RAID 1 and 10 (often called mirroring) duplicates data between the disks. Commonly two hard drives of the same size mirror each other, in this way all your data is duplicated and if one drive fails the data will remain. The downside is you need twice as much space for everything. In Linux the RAID 1 type has a more advanced driver, called RAID 10.

There are many more RAID levels but they are beyond the scope of this article, a summary is available at Wikipedia. Another overview of the various RAID levels can be found at the AC&NC site.

General Notes
Ideally, all partitions in a software RAID block should be the same size. Any difference in the drives makes it more difficult for the computer to manage the RAID and may result in wasted space. It is recommended that only one IDE drive be connected per IDE channel because a failed drive can bring down the whole channel. IDE also does not provide for overlapping "seek", so accesses to another drive on the same channel, even when both drives function properly, block until the first drive is done transferring causing the array to run slower. If you create a RAID with two IDE disks per channel you basically double your risk for losing the availability of your mass storage system since most RAID configurations can only sustain the loss of a SINGLE disk. Many users prefer RAID 1 or RAID 5, as they are a good balance between speed and increased data safety against hardware failure. With RAID 0 your chances of losing data increase by a factor equal to the number of drives.

Note: A RAID system does not obviate the need for (preferably off site) backups! RAID protects against drive failure only; not rm -rf /*, other software-related errors(!), fire, earthquake, vandalism, theft or any other threat to an entire RAID.

Warning: Don't think that just because you're running a RAID level that provides redundancy, you can stop worrying about drive failures. If a drive in a RAID-1 or RAID-10 starts failing and you aren't aware of it, your data ends up as silently corrupted as it would be if you were running one drive. If you aren't vigilant enough to watch your logs, you have no one to blame but yourself when you lose data.

Warning: A special warning to RAID-1 and RAID-10 users: don't try to mount elements of the RAID-1 separately simply because you know the contents of the partitions are identical! Mounting one drive of a RAID-1 or RAID-10 running a journaled file system (like ReiserFS) can make the RAID-1 or RAID-10 as a whole unmountable. If you must get data off a drive mount it READ-ONLY.

RAID BIOS
In most cases your mainboard will be shipped with RAID disabled. You need to enable the RAID feature in your BIOS first. Most of the latest mainboards supply a vendor-specific RAID and AHCI.

Warning: This is, actually, wrong; Linux software RAID has nothing to do with BIOS RAID's. I'm not sure whether this is needed for dual-boot boxes, but if your rig is Linux-only, DO NOT, repeat DO NOT, enable the RAID in BIOS. (For an explanation, see here: 2 )

A vendor-specific RAID (i.e. nVidia nvraid) requires an operating system driver written for the specific RAID controller. You should use this method if you also plan to install Windows on the machine. Windows drivers shipped with the mainboard most commonly require the vendor-specific RAID.

The AHCI is a common interface that was defined to have a vendor-driver independent access to the hardware. In most cases this is the best way to access your RAID since you aren't bound to any RAID vendor.

A software RAID is compatible with a dual boot environments involving Windows, but Windows will not be able to mount or read any partition involved in the pure software RAID, and all pseudo-hardware RAID controllers must be turned off.

Migrating
For information on migrating an existing installation to a RAID see: HOWTO Migrate To RAID.

About the Installation
This HOWTO assumes you are using SATA drives, but it should work equally well with IDE drives. If you are using IDE drives, for maximum performance make sure that each drive is a master on its own separate channel.

To partition drives similarly to how the Gentoo install guide suggests: /boot would be best chosen as a RAID1. Recall that in RAID1, data is mirrored on multiple disks, so if there is a problem with your RAID somehow, GRUB/LILO could point to any of the copies of the kernel on any of the partitions in the /boot RAID1 and a normal boot will occur.

In this HOWTO, /, /boot and /home will be RAID 1 (mirror) while the swap partition will be RAID 0 (striping). For added performance you could use RAID 10 in the far layout (raid10,f2) as this would give you double the performance for sequential reading compared to RAID 1.

Note: If you do not place your swap partition on RAID 1 and a drive containing your swap partition fails, your system will likely die when your system tries to access the swap partition.

Note: A swap partition is faster than a swap file but requires a more complex partitioning of your disk(s). Changing the size of a swap file does not require repartitioning. Volume managers such as LVM(2) or EVMS work with volumes which provide sophisticated and more flexible alternatives to partitions. LVM(2) or EVMS often let you change e.g. the size of volumes on the fly. Sometimes swap partition(s) can be shared between certain operating systems in dual/multiple boot setups, such as between multiple Linux distributions.

Load kernel modules
Load the appropriate raid module.

modprobe raid1 (For RAID 1)

modprobe raid10 (For RAID 10)

modprobe raid0 (For RAID 0)

modprobe raid5 (For RAID 5)

If you do not have the modules for raid support, you will need to compile them using the Gentoo source.

cd /usr/src/linux make menuconfig

From here select "Device Drivers" then "Multi-device support (RAID and LVM)" and make sure the option is checked. Then choose to build "RAID support" As a module or built in if you need the kernel to support it at boot. Finally, select which modes you need then exit and compile.

Setting up the partitions
You can partition your drives with tools such as fdisk or cfdisk. There is nothing different here except to make sure:


 * 1) Your partitions are the same size on each drive. See below for instructions on copying a partition map.
 * 2) Your partitions to be included in the RAID are set to partition type "fd" (linux raid auto-detect). If not set to "fd", the partitions will fail to be added to the RAID on reboot.

This might be a good time to play with the hdparm tool. It allows you to change hard drive access parameters, which might speed up disk access. Another use is if you are using a whole disk as a hot spare. You may wish to change its spin down time so that it spends most of its time in standby, thus extending its life.

You can also setup the first disk partitions and then copy the entire partition table to the second disk with the following command: sfdisk

Setting up the RAID
If your Gentoo version is older than 2007.0 or you have not migrated to udev, you will need to create the metadevice nodes before creating the RAID arrays (this step isn't necessary with 2007.0 anymore since it uses udev):

cd /dev && MAKEDEV md

After partitioning, create the /etc/mdadm.conf file (yes, indeed, on the Installation CD environment) using mdadm, an advanced tool for RAID management. For instance, to have your boot, swap and root partition mirrored (RAID-1) covering /dev/sda and /dev/sdb, you can use:

mdadm --create --verbose /dev/md1 --level=1 -e 0.90 --raid-devices=2 /dev/sda1 /dev/sdb1 mdadm --create --verbose /dev/md2 --level=0 --raid-devices=2 /dev/sda2 /dev/sdb2 (Optional) mdadm --create --verbose /dev/md3 --level=1 --raid-devices=2 /dev/sda3 /dev/sdb3 mdadm --create --verbose /dev/md4 --level=1 --raid-devices=2 /dev/sda4 /dev/sdb4

This will create your RAID device /dev/md*. The created RAID devices contain the 1.0 superblocks by default. The latest kernels have an autodetection based on the partition type (fd) while older kernels were looking directly for the superblock. There are three workarounds for this:


 * 1) Change the filesystem type to fd. This is generally the best solution.
 * 2) Passing '-e 0.90' to these lines will cause 0.9 superblocks on the partitions so autodetecting works. This might break with later kernels. The current version of grub as of this writing (0.97-r10) does not understand the current version of superblock (1.20) and will fail to install the MBR's using the steps below without falling back to this version on at least the /boot partition.
 * 3) If you cannot change anything to the disk partition layout, add md=5,/dev/sda5,/dev/sdb5 to the kernel parameters during boot. See troubleshooting at the bottom of this page.   Also the root filesystem appears to prefer a 0.9 superblock.  The boot will kernel panic trying to find the root filesystem unless either the old version of superblock is used, or you resort to using an initrd filesystem which uses mdadm to set up the array first.

mdadm --create --verbose /dev/md1 --level=1 --raid-devices=2 /dev/sda1 missing

Then after you have the other hard drive installed mdadm --manage --add /dev/md1 /dev/sdb1

Note: Of course there may be weirdness when you add another hard drive to the system when it comes to configuring the boot loader. Due to drive order/numbering.

Later, after you have created your file system, save your mdadm.conf file:

mdadm --detail --scan >> /etc/mdadm.conf

Waiting for the RAID to settle
You may check /proc/mdstat to see if the RAID devices are done syncing:

cat /proc/mdstat

you can use also:

watch cat /proc/mdstat

which refresh the output of /proc/mdstat every n seconds. You can cancel the output with: CTRL+C

It should look something like this (showing one array syncing and the other one already completed):

Personalities : [raid1] md2 : active raid1 sdb3[1] sda3[0] 184859840 blocks [2/2] [UU] [======>..............] resync = 33.1% (61296896/184859840) finish=34.3min  speed=59895K/sec md1 : active raid1 sdb1[1] sda1[0] 10000320 blocks [2/2] [UU] unused devices:

If an array is still syncing, you may still proceed to creating filesystems, because the sync operation is completely transparent to the file system. Please note that the sync will need more time this way and if a drive happens to fail before the RAID sync finishes, then you're in trouble. It's generally recommend to wait until the sync is completed (or omit the sync by passing --clean during the array creation if the disks are empty.

Even GRUB/LILO could be installed before the sync is finished, but you should wait to reboot till the raid is synced.

Creating the Filesystem
Create the filesystems on the disk.

mke2fs -j /dev/md1 mke2fs -j /dev/md3

or

mkfs.ext3 -j -O dir_index,resize_inode /dev/md4

There were problems reported with journalling filesystems and RAID setups, but these seem to be hardware-specific. Common sense is that you can use ext2 and ext3 without issues on RAID.

Create the Swap Partition
As described above, we used RAID-0 for our swap partition. If one of your discs dies, the system will most likely crash (since in a RAID-0 the swap data will be split over all discs).

Your /etc/fstab could look like:

/dev/md2       swap           swap    defaults         0 0

There is no performance reason to use RAID for swap. The kernel itself can stripe swapping on several devices if you give them the same priority in the /etc/fstab file.

A striped /etc/fstab looks like:

/dev/sda2      swap           swap    defaults,pri=1   0 0 /dev/sdb2      swap           swap    defaults,pri=1   0 0

For reliability reasons, you may choose to use RAID for swap. With a non-RAID configuration as shown above, a drive failure on any of the swap can crash your system. Also, the above configuration, while it may be faster than using a single drive for swap, it is also 2 times more likely for a drive to fail and take your system with it.

Mount Partitions
Turn the swap on:

mkswap /dev/md2 swapon /dev/md2

Mount the /, /boot and /home RAIDs:

mount /dev/md3 /mnt/gentoo mkdir /mnt/gentoo/boot mount /dev/md1 /mnt/gentoo/boot mkdir /mnt/gentoo/home mount /dev/md4 /mnt/gentoo/home

Copy raid configuration

mdadm --detail --scan >> /etc/mdadm.conf mkdir /mnt/gentoo/etc cp /etc/mdadm.conf /mnt/gentoo/etc/mdadm.conf

Make chrooted environment like real ;-)

mount -t proc none /mnt/gentoo/proc mount -o bind /dev /mnt/gentoo/dev

Chroot
Just like described in the [Gentoo Handbook] you continue the installation with entering the chroot.

chroot /mnt/gentoo /bin/bash env-update source /etc/profile export PS1="(chroot) $PS1"

From now on install just like described in the Gentoo Handbook

Differences to a Usual Installation
From now onwards, use /dev/md1 for the boot partition, /dev/md3 for the root partition and /dev/md4 for the home partition.

Don't forget to copy over /etc/mdadm.conf to /mnt/gentoo/etc.

Kernel Configuration
When you're configuring your kernel, make sure you have the appropriate RAID support in your kernel and not as module.

Extra tools
You need to install mdadm as well.

emerge mdadm rc-update add mdadm boot

Installing Grub onto both MBRs
Since the /boot partition is a RAID, grub cannot read it to get the bootloader. It can only access physical drives. Thus, you still use (hd0,0) in this step.

Run grub:

grub --no-floppy

You must see GRUB prompt:

grub>

If you are using a RAID 1 mirror disk system, you will want to install grub on all the disks in the system, so that when one disk fails, you are still able to boot. The find command above will list the disks, e.g.

grub> find /boot/grub/stage1 (hd0,0) (hd1,0) grub>

Now, if your disks are /dev/sda and /dev/sdb, do the following to install GRUB on /dev/sda MBR:

device (hd0) /dev/sda root (hd0,0) setup (hd0)

This will install grub into the /dev/sdb MBR:

device (hd0) /dev/sdb root (hd0,0) setup (hd0)

The device command tells grub to assume the drive is (hd0), i.e. the first disk in the system, when it is not necessarily the case. If your first disk fails, however, your second disk will then be the first disk in the system, and so the MBR will be correct.

The grub.conf does change from the normal install. The difference is in the specified root drive, it is now a RAID drive and no longer a physical drive.For example it would look like: File: /boot/grub/grub.conf

default 0 timeout 30 splashimage=(hd0,0)/boot/grub/splash.xpm.gz

title=Gentoo Linux root (hd0,0) kernel /bzImage root=/dev/md3

Remember that you can (and might need to) specify md* devices by hand, for example here we define md3 from /dev/sd[ab]3. For more details see kernel Documentation/md.txt. File: /boot/grub/grub.conf

kernel /bzImage md=3,/dev/sda3,/dev/sdb3 root=/dev/md3

Setting up LILO
I successfully used the following; it boots off either raid1 or a single drive if a disk comprising the raid1 is damaged (assuming the boot image is on /dev/md1).

boot=/dev/md1 prompt timeout = 50 lba32 raid-extra-boot=mbr-only image = /boot/vmlinuz label = linux read-only # read-only for checking root = /dev/md1

Misc RAID stuff
To see if your RAID is functioning properly after reboot do:

cat /proc/mdstat

There should be one entry per RAID drive. The RAID 1 drives should have a "[UU]" in the entry, letting you know that the two hard drives are "up, up". If one goes down you will see "[U_]". If this ever happens your system will still run fine, but you should replace that hard drive as soon as possible.

To rebuild a RAID 1:

1. Power down the system 2. Replace the failed disk 3. Power up the system once again 4. Create identical partitions on the new disk - i.e the same as the 1 good remaining disk 5. Remove the old partition from the array and add the new partition back

You can copy a partition map from one disk to another with dd. Additionally, since the target drive is not in use we can rewrite partition map with fdisk to force the partition map to be re-read by the kernel:

Command (m for help): w
 * 1) dd if=/dev/sdX of=/dev/sdY count=1
 * 2) fdisk /dev/sdY

A better way would be to use sfdisk. A partition table can be cloned like this:

sfdisk -d /dev/sdX | sfdisk /dev/sdY -

It dumps the partition table of sdX to stdout, which the second sfdisk call then uses as input for sdY.

To remove the failed partition and add the new partition:

mdadm /dev/mdX -r /dev/sdYZ -a /dev/sdYZ

Note: do this for each of your partitions. i.e. md0 md1 md2 etc.

Watch the automatic reconstruction run with:

watch -n 1 cat /proc/mdstat

Notification
Assuming you have properly setup /etc/mdadm.conf according to this guide, you can receive e-mail alerts about malfunctions in the RAID setup using mdadm as a service.

Note: Make sure you can send mail from your machine. If all you need is basic SMTP support, try nail.

Add mail notification to /etc/mdadm.conf: File: /etc/mdadm.conf

MAILADDR root@example.com

Verify your setup works with:

mdadm -Fs1t

Add /etc/init.d/mdadm to startup and start it.

Write-intent bitmap
A write-intent bitmap is used to record which areas of a raid component have been modifed since the raid array was last in sync. Basically, the raid driver periodically writes out a small table recording which portions of a raid component have changed. Therefore, if you lose power before all drives are in sync, when the array starts up a full resync is not needed. Only the changed portions need to be resynced.

Note: Note that internal write-intent bitmaps can (and probably will) have serious performance impacts on your system. Please read the following links and decide if you REALLY need it. Consider (my opinion): write-intent bitmaps help if you have to resync your array often as you save time then. But if you have to do this, something is broken in your system and you should repair that instead of the symptoms.

* http://people.debian.org/~terpstra/message/20080205.163427.68e4f1d0.en.html * http://ubuntumagnet.com/2008/01/write-intent-bitmaps-md-devices * http://groups.google.com/group/linux.kernel/msg/56be6e2d4d2cbbfb * http://blog.ganneff.de/blog/2008/01/30#mdraid_bitmap_internal_bad

To turn on write-intent bitmapping
Install a modern mdadm: >=sys-fs/mdadm-2.4.1 Install a modern kernel: >=2.6.16

Your raid volume must be configured with a persistent superblock and has to be fully synchronized. Use the following command to verify whether these conditions have been met:

mdadm -D /dev/mdX

Make sure it says:

State : active Persistence : Superblock is persistent

Add a bitmap with the following command:

mdadm /dev/mdX -Gb internal

You can monitor the status of the bitmap as you write to your array with:

watch -n .1 cat /proc/mdstat

To turn off write-intent bitmapping
Remove the bitmap with the following command:

mdadm /dev/mdX -Gb none

Data Scrubbing
When you have multiple copies of data, you can use data scrubbing to actively scan for corrupt data and clean up the corruption by replacing the corrupt data with correct data from a surviving copy.

Normally, raid passively detects unreadable blocks. When you attempt to read a block, if a read error occurs, the data is reconstructed from the rest of the array and the unreadable block is rewritten. If the block cannot be rewritten the defective disk is kicked out of the active array.

During raid reconstruction, if you run across a previously undetected unreadable block, you may not be able to reconstruct your array without data corruption. The larger the disk, the higher the odds that passive bad block detection will be inadequate. Therefore, with today's large disks it is important to actively perform data scrubbing on your array.

With a modern kernel, >=2.6.16, this command will initiate a data consistency check and a unreadable block check: reading all blocks, checking them for consistency, and attempting to rewrite unreadable blocks:

echo check >> /sys/block/mdX/md/sync_action

You can monitor the progress of the check with:

watch -n .1 cat /proc/mdstat

The system automatically works out at what speed to do the check and if it seems to be going a bit too slow, you can check what the minimum raid speed is and increase it on the fly:

1000
 * 1) cat /proc/sys/dev/raid/speed_limit_min

You should have your array checked daily or weekly by adding the appropriate command to /etc/crontab. Depending on the used cron service adding a line with the following command will initiate a check of all known RAID devices:

sh -c 'for raid in /sys/block/md*/md/sync_action;do echo "check" >> ${raid};done'

You can use the following script and copy it to /etc/cron.daily or /etc/cron.weekly (whatever you prefer more):

# # raid_base="/sys/block" raid_names="md*" cd ${raid_base} for raid in ${raid_names}; do    test -f ${raid_base}/${raid}/md/sync_action && \ echo "check" >> ${raid_base}/${raid}/md/sync_action done
 * 1) !/bin/bash
 * 1) This script checks all RAID devs on the system

A slightly more compact alternative to the above script:

# # for raid in /sys/block/md*/md/sync_action; do        echo "check" >> ${raid} done
 * 1) !/bin/bash
 * 1) This script checks all RAID devices on the system

Caveat
The check command actively discovers bad blocks. That is, it discovers unreadable blocks and blocks which are inconsistent (i.e. mismatched) across the RAID set. If the mismatch count is non-zero, your RAID is potentially corrupt.

Inconsistent blocks may occur as part of normal operation or as the result of an error.

Normal causes of inconsistent blocks
Blocks that are inconsistent as a result of these causes are inconsequential:

Swap changed

It is possible for swap to write inconsistent blocks. To remove this inconsistency: turn the swapoff, zero the swap, re-mkswap, turn the swapon.

File changed during 'check'

If a file is changed during a check, it may appear inconsistent. Although, the check reports a mismatch this is harmless as the file will be made consistent when closed.

Changed file truncated

If a changed file is truncated, it may appear inconsistent. Although, the check reports a mismatch, this is harmless as the mismatch is past the end of the file.

Grub

Grub writing to the disk before the RAID is assembled may lead to inconsistent blocks as well. See the following discussion for more in-depth details: http://thread.gmane.org/gmane.linux.raid/18481/focus=18915

Abnormal Causes of Inconsistent Blocks
Inconsistent blocks may occur spontaneously as disk drives may discover and replace unreadable blocks on their own or as a result of SMART tests. Ideally, an error occurs when an attempt is made to read the block and software RAID transparently corrects the problem. It is possible due to flaws in the drive for errors not to be reported.

The check command attempts to rewrite unreadable blocks. The check command does not correct mismatched blocks. A count of these mismatched blocks is available after the check command runs:

cat /sys/block/mdX/md/mismatch_cnt

If the mismatch occurs in free space there is no impact.

Running an fsck may not fix your problem. This is because the fsck may read data from the correct block rather than the block containing undefined data.

The repair command can be used to place the mismatched blocks into a consistent state:

echo repair >> /sys/block/mdX/md/sync_action

You can monitor the progress of the repair with:

watch -n .1 cat /proc/mdstat

For RAID with parity (e.g. RAID-5) the parity block will be reconstructed. For RAID without parity (e.g. RAID-1) a block will be chosen at random as the correct block. Therefore, although running the repair command will make your RAID consistent it will not guarantee your partition is not corrupt.

To ensure your partition is not corrupt repair the RAID device and then reformat the partition.

Dual Installation
If you have the same troubles I have with getting Windows up and running after using this guide, look at HOWTO nvidia raid dual boot

Autoassembly Fails
If you created your RAID-devices with a recent version of mdadm it will use 1.0 superblocks. Those superblocks (unlike the 0.9 superblocks) cannot be detected by the autoassembly. This leads to issues if your ROOT-device is on a RAID. Let's imagine that you installed your system on /dev/md5. If you have something in your grub.conf like this File: /boot/grub/grub.conf

title gentoo-2.6.24-r3 root (hd0,0) kernel /gentoo.2624r3 root=/dev/md5

This will fail during autoassembly. You need to inform the kernel which hard-disk-partitions should be used to construct /dev/md5. File: /boot/grub/grub.conf

title gentoo-2.6.24-r3 root (hd0,0) kernel /gentoo.2624r3 root=/dev/md5 md=5,/dev/sda5,/dev/sdb5

All other partitions will be set from /etc/mdadm.conf Note: With latest genkernel, you should pass domdadm option to kernel. In case if busybox can't find /etc/mdadm.conf, it will try to autodetect arrays.

Did you reboot with no luck and need to make some changes?
If you did reboot with completely installed system and it failed. Boot with install CD and do for example this:

mdadm --assemble /dev/md1 /dev/sda1 /dev/sdb1

mdadm --assemble /dev/md2 /dev/sda2 /dev/sdb2

mdadm --assemble /dev/md3 /dev/sda3 /dev/sdb3

Swap of course formatted, activate it.
swapon /dev/md2

Mount partitions
mount /dev/md3 /mnt/gentoo

mount /dev/md1 /mnt/gentoo/boot