[ The PC Guide | Systems and Components Reference Guide | Hard Disk Drives | Hard Disk
Performance, Quality and Reliability | Redundant Arrays of
Inexpensive Disks (RAID) | RAID Concepts and Issues | RAID Performance Issues ]
Read and Write Performance
Hard disks perform two distinct functions: writing data, and then reading it back. In
most ways, the electronic and mechanical processes involved in these two operations are
very similar. However, even within a single hard disk, read and write performance are
often different in small but important ways. This is discussed in more detail here. When it comes to RAID, the differences
between read and write performance are magnified. Because of the different ways that disks
can be arranged in arrays, and the different ways data can be stored, in some cases there
can be large discrepancies in how "method A" compares to "method B"
for read performance, as opposed to write performance.
The fundamental difference between reading and writing under RAID is this: when you
write data in a redundant environment, you must access every place where that data is
stored; when you read the data back, you only need to read the minimum amount of data
necessary to retrieve the actual data--the redundant information does not need to be
accessed on a read. OK, this isn't as complicated as I probably just made it sound. :^)
Let's see how various storage techniques used in RAID differ in this regard:
- Mirroring: Read performance under mirroring is far superior to write
performance. Let's suppose you are mirroring two drives under RAID 1. Every piece of data
is duplicated, stored on both drives. This means that every byte of data stored must be
written to both drives, making write performance under RAID 1 actually a bit slower
than just using a single disk; even if it were as fast as a single disk, both drives are
tied up during the write. But when you go to read back the data? There's absolutely no
reason to access both drives; the controller, if intelligently programmed, will only ask
one of the drives for the data--the other drive can be used to satisfy a different
request. This makes RAID significantly faster than a single drive for reads, under most
conditions.
- Striping Without Parity: A RAID 0 array has about equal read and write
performance (or more accurately, roughly the same ratio of read to write performance that
a single hard disk would have.) The reason is that the "chopping up" of the data
without parity calculation means you must access the same number of drives for reads as
you do for writes.
- Striping With Parity: As with mirroring, write performance when
striping with parity (RAID levels 3 through 6) is worse than read performance, but unlike
mirroring, the "hit" taken on a write when doing striping with parity is much
more significant. Here's how the different accesses fare:
- For reads, striping with parity can actually be faster than striping
without parity. The parity information is not needed on reads, and this makes the array
behave during reads in a way similar to a RAID 0 array, except that the data is spread
across one extra drive, slightly improving parallelism.
- For sequential writes, there is the dual overhead of parity calculations as well as
having to write to an additional disk to store the parity information. This makes
sequential writes slower than striping without parity.
- The biggest discrepancy under this technique is between random reads and random writes.
Random reads that only require parts of a stripe from one or two disks can be processed in
parallel with other random reads that only need parts of stripes on different disks. In
theory, random writes would be the same, except for one problem: every time you change any
block in a stripe, you have to recalculate the parity for that stripe, which requires two
writes plus reading back all the other pieces of the stripe! Consider a RAID 5
array made from five disks, and a particular stripe across those disks that happens to
have data on drives #3, #4, #5 and #1, and its parity block on drive #2. You want to do a
small "random write" that changes just the block in this stripe on drive #3.
Without the parity, the controller could just write to drive #3 and it would be done. With
parity though, the change to drive #3 affects the parity information for the entire
stripe. So this single write turns into a read of drives #4, #5 and #1, a parity
calculation, and then a write to drive #3 (the data) and drive #2 (the newly-recalculated
parity information). This is why striping with parity stinks for random write performance.
(This is also why RAID 5 implementations in software are not recommended if you are
interested in performance.)
- Another hit to write performance comes from the dedicated parity drive used in certain
striping with parity implementations (in particular, RAID levels 3 and 4). Since only one
drive contains parity information, every write must write to this drive, turning
it into a performance bottleneck. Under implementations with distributed parity, like RAID
5, all drives contain data and parity information, so there is no single bottleneck drive;
the overheads mentioned just above still apply though.
Note: As if the
performance hit for writes under striping with parity weren't bad enough, there is even
one more piece of overhead! The controller has to make sure that when it changes data and
its associated parity, all the changes happen simultaneously; if the process were
interrupted in the middle, say, after the data were changed and not the parity, the
integrity of the array would be compromised. To prevent this, a special process must be
used, sometimes called a two-phase commit. This is similar to the techniques used
in database operations, for example, to make sure that when you transfer money from your
checking account to your savings account, it doesn't get subtracted from one without being
certain that it was added to the other (or vice-versa). More overhead, more performance
slowdown.
The bottom line that results from the difference between read and write performance is
that many RAID levels, especially ones involving striping with parity, provide far better
net performance improvement based on the ratio of reads to writes in the intended
application. Some applications have a relatively low number of writes as a percentage of
total accesses; for example, a web server. For these applications, the very popular RAID 5 solution may be an ideal choice. Other
applications have a much higher percentage of writes; for example, an interactive database
or development environment. These applications may be better off with a RAID 01 or 10 solution, even if it does cost a bit
more to set up.
Note: Some controllers
employ write caching to improve performance during writes; see here for more on this advanced feature.
Next: Positioning and Transfer Performance
Home - Search
- Topics - Up
|