123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237 |
- Tools that manage md devices can be found at
- http://www.<country>.kernel.org/pub/linux/utils/raid/....
- Boot time assembly of RAID arrays
- ---------------------------------
- You can boot with your md device with the following kernel command
- lines:
- for old raid arrays without persistent superblocks:
- md=<md device no.>,<raid level>,<chunk size factor>,<fault level>,dev0,dev1,...,devn
- for raid arrays with persistent superblocks
- md=<md device no.>,dev0,dev1,...,devn
- or, to assemble a partitionable array:
- md=d<md device no.>,dev0,dev1,...,devn
-
- md device no. = the number of the md device ...
- 0 means md0,
- 1 md1,
- 2 md2,
- 3 md3,
- 4 md4
- raid level = -1 linear mode
- 0 striped mode
- other modes are only supported with persistent super blocks
- chunk size factor = (raid-0 and raid-1 only)
- Set the chunk size as 4k << n.
-
- fault level = totally ignored
-
- dev0-devn: e.g. /dev/hda1,/dev/hdc1,/dev/sda1,/dev/sdb1
-
- A possible loadlin line (Harald Hoyer <HarryH@Royal.Net>) looks like this:
- e:\loadlin\loadlin e:\zimage root=/dev/md0 md=0,0,4,0,/dev/hdb2,/dev/hdc3 ro
- Boot time autodetection of RAID arrays
- --------------------------------------
- When md is compiled into the kernel (not as module), partitions of
- type 0xfd are scanned and automatically assembled into RAID arrays.
- This autodetection may be suppressed with the kernel parameter
- "raid=noautodetect". As of kernel 2.6.9, only drives with a type 0
- superblock can be autodetected and run at boot time.
- The kernel parameter "raid=partitionable" (or "raid=part") means
- that all auto-detected arrays are assembled as partitionable.
- Superblock formats
- ------------------
- The md driver can support a variety of different superblock formats.
- Currently, it supports superblock formats "0.90.0" and the "md-1" format
- introduced in the 2.5 development series.
- The kernel will autodetect which format superblock is being used.
- Superblock format '0' is treated differently to others for legacy
- reasons - it is the original superblock format.
- General Rules - apply for all superblock formats
- ------------------------------------------------
- An array is 'created' by writing appropriate superblocks to all
- devices.
- It is 'assembled' by associating each of these devices with an
- particular md virtual device. Once it is completely assembled, it can
- be accessed.
- An array should be created by a user-space tool. This will write
- superblocks to all devices. It will usually mark the array as
- 'unclean', or with some devices missing so that the kernel md driver
- can create appropriate redundancy (copying in raid1, parity
- calculation in raid4/5).
- When an array is assembled, it is first initialized with the
- SET_ARRAY_INFO ioctl. This contains, in particular, a major and minor
- version number. The major version number selects which superblock
- format is to be used. The minor number might be used to tune handling
- of the format, such as suggesting where on each device to look for the
- superblock.
- Then each device is added using the ADD_NEW_DISK ioctl. This
- provides, in particular, a major and minor number identifying the
- device to add.
- The array is started with the RUN_ARRAY ioctl.
- Once started, new devices can be added. They should have an
- appropriate superblock written to them, and then passed be in with
- ADD_NEW_DISK.
- Devices that have failed or are not yet active can be detached from an
- array using HOT_REMOVE_DISK.
- Specific Rules that apply to format-0 super block arrays, and
- arrays with no superblock (non-persistent).
- -------------------------------------------------------------
- An array can be 'created' by describing the array (level, chunksize
- etc) in a SET_ARRAY_INFO ioctl. This must has major_version==0 and
- raid_disks != 0.
- Then uninitialized devices can be added with ADD_NEW_DISK. The
- structure passed to ADD_NEW_DISK must specify the state of the device
- and it's role in the array.
- Once started with RUN_ARRAY, uninitialized spares can be added with
- HOT_ADD_DISK.
- MD devices in sysfs
- -------------------
- md devices appear in sysfs (/sys) as regular block devices,
- e.g.
- /sys/block/md0
- Each 'md' device will contain a subdirectory called 'md' which
- contains further md-specific information about the device.
- All md devices contain:
- level
- a text file indicating the 'raid level'. This may be a standard
- numerical level prefixed by "RAID-" - e.g. "RAID-5", or some
- other name such as "linear" or "multipath".
- If no raid level has been set yet (array is still being
- assembled), this file will be empty.
- raid_disks
- a text file with a simple number indicating the number of devices
- in a fully functional array. If this is not yet known, the file
- will be empty. If an array is being resized (not currently
- possible) this will contain the larger of the old and new sizes.
- As component devices are added to an md array, they appear in the 'md'
- directory as new directories named
- dev-XXX
- where XXX is a name that the kernel knows for the device, e.g. hdb1.
- Each directory contains:
- block
- a symlink to the block device in /sys/block, e.g.
- /sys/block/md0/md/dev-hdb1/block -> ../../../../block/hdb/hdb1
- super
- A file containing an image of the superblock read from, or
- written to, that device.
- state
- A file recording the current state of the device in the array
- which can be a comma separated list of
- faulty - device has been kicked from active use due to
- a detected fault
- in_sync - device is a fully in-sync member of the array
- spare - device is working, but not a full member.
- This includes spares that are in the process
- of being recoverred to
- This list make grow in future.
- An active md device will also contain and entry for each active device
- in the array. These are named
- rdNN
- where 'NN' is the possition in the array, starting from 0.
- So for a 3 drive array there will be rd0, rd1, rd2.
- These are symbolic links to the appropriate 'dev-XXX' entry.
- Thus, for example,
- cat /sys/block/md*/md/rd*/state
- will show 'in_sync' on every line.
- Active md devices for levels that support data redundancy (1,4,5,6)
- also have
- sync_action
- a text file that can be used to monitor and control the rebuild
- process. It contains one word which can be one of:
- resync - redundancy is being recalculated after unclean
- shutdown or creation
- recover - a hot spare is being built to replace a
- failed/missing device
- idle - nothing is happening
- check - A full check of redundancy was requested and is
- happening. This reads all block and checks
- them. A repair may also happen for some raid
- levels.
- repair - A full check and repair is happening. This is
- similar to 'resync', but was requested by the
- user, and the write-intent bitmap is NOT used to
- optimise the process.
- This file is writable, and each of the strings that could be
- read are meaningful for writing.
- 'idle' will stop an active resync/recovery etc. There is no
- guarantee that another resync/recovery may not be automatically
- started again, though some event will be needed to trigger
- this.
- 'resync' or 'recovery' can be used to restart the
- corresponding operation if it was stopped with 'idle'.
- 'check' and 'repair' will start the appropriate process
- providing the current state is 'idle'.
- mismatch_count
- When performing 'check' and 'repair', and possibly when
- performing 'resync', md will count the number of errors that are
- found. The count in 'mismatch_cnt' is the number of sectors
- that were re-written, or (for 'check') would have been
- re-written. As most raid levels work in units of pages rather
- than sectors, this my be larger than the number of actual errors
- by a factor of the number of sectors in a page.
- Each active md device may also have attributes specific to the
- personality module that manages it.
- These are specific to the implementation of the module and could
- change substantially if the implementation changes.
- These currently include
- stripe_cache_size (currently raid5 only)
- number of entries in the stripe cache. This is writable, but
- there are upper and lower limits (32768, 16). Default is 128.
- strip_cache_active (currently raid5 only)
- number of active entries in the stripe cache
|