Stripe Width and Stripe Size

RAID arrays that use striping improve performance by splitting up files into small pieces and distributing them to multiple hard disks. Most striping implementations allow the creator of the array control over two critical parameters that define the way that the data is broken into chunks and sent to the various disks. Each of these factors has an important impact on the performance of a striped array.

The first key parameter is the stripe width of the array. Stripe width refers to the number of parallel stripes that can be written to or read from simultaneously. This is of course equal to the number of disks in the array. So a four-disk striped array would have a stripe width of four. Read and write performance of a striped array increases as stripe width increases, all else being equal. The reason is that adding more drives to the array increases the parallelism of the array, allowing access to more drives simultaneously. You will generally have superior transfer performance from an array of eight 18 GB drives than from an array of four 36 GB of the same drive family, all else being equal. Of course, the cost of eight 18 GB drives is higher than that of four 36 GB drives, and there are other concerns such as power supply to be dealt with.

The second important parameter is the stripe size of the array, sometimes also referred to by terms such as block size, chunk size, stripe length or granularity. This term refers to the size of the stripes written to each disk. RAID arrays that stripe in blocks typically allow the selection of block sizes in kiB ranging from 2 kiB to 512 kiB (or even higher) in powers of two (meaning 2 kiB, 4 kiB, 8 kiB and so on.) Byte-level striping (as in RAID 3) uses a stripe size of one byte or perhaps a small number like 512, usually not selectable by the user.

Warning: Watch out for sloppy tech writers and marketing droids who use the term "stripe width" when they really mean "stripe size". Since stripe size is a user-defined parameter that can be changed easily--and about which there is lots of argument :^)--it is far more often discussed than stripe width (which, once an array has been set up, is really a static value unless you add hardware.) Also, watch out for people who refer to stripe size as being the combined size of all the blocks in a single stripe. Normally, an 8 kiB stripe size means that each block of each stripe on each disk is 8 kiB. Some people, however, will refer to a four-drive array as having a stripe size of 8 kiB, and mean that each drive has a 2 kiB block, with the total making up 8 kiB. This latter meaning is not commonly used.

The impact of stripe size upon performance is more difficult to quantify than the effect of stripe width:

Tip: For a graphical illustration showing how different stripe sizes work, see the discussion of RAID 0.

Obviously, there is no "optimal stripe size" for everyone; it depends on your performance needs, the types of applications you run, and in fact, even the characteristics of your drives to some extent. (That's why controller manufacturers reserve it as a user-definable value!) There are many "rules of thumb" that are thrown around to tell people how they should choose stripe size, but unfortunately they are all, at best, oversimplified. For example, some say to match the stripe size to the cluster size of FAT file system logical volumes. The theory is that by doing this you can fit an entire cluster in one stripe. Nice theory, but there's no practical way to ensure that each stripe contains exactly one cluster. Even if you could, this optimization only makes sense if you value positioning performance over transfer performance; many people do striping specifically for transfer performance.

A comparison of different stripe sizes. On the left, a four-disk RAID 0 array with a stripe size
of 4 kiB; on the right, the same array with the same data, but using a 64 kiB stripe size. In these
diagrams four files are shown color-coded: the red file is 4 kiB in size; the blue is 20 kiB;
the green is 100 kiB; and the magenta is 500 kiB. They are shown drawn to scale to illustrate
how much space they take up in relative terms in the array (one vertical pixel represents 1 kiB.)

You can see how dramatically differently "medium-sized" files are treated as stripe size changes.
The 4 kiB file takes only one block on one disk in both arrays, and the 500 kiB file is spread
across all four disks in both arrays. But when the large stripe size is used, the blue file appears
on only one disk instead of all four, and the green file is on two instead of four. This improves
random positioning to these files. In both cases the stripe width is of course four. For a
view of the same array with an "in-between" stripe size of 16 kiB, see the page on RAID 0.

So what should you use for a stripe size? The best way to find out is to try different values: empirical evidence is the best for this particular problem. Also, as with most "performance optimizing endeavors", don't overestimate the difference in performance between different stripe sizes; it can be significant, particularly if contrasting values from opposite ends of the spectrum like 4 kiB and 256 kiB, but the difference often isn't all that large between similar values. And if you must have a rule of thumb, I'd say this: transactional environments where you have large numbers of small reads and writes are probably better off with larger stripe sizes (but only to a point); applications where smaller numbers of larger files need to be read quickly will likely prefer smaller stripes. Obviously, if you need to balance these requirements, choose something in the middle. :^)

Note: The improvement in positioning performance that results from increasing stripe size to allow multiple parallel accesses to different disks in the array depends entirely on the controller's smarts (as do a lot of other things in RAID). For example, some controllers are designed to not do any writes to a striped array until they have enough data to fill an entire stripe across all the disks in the array. Clearly, this controller will not improve positioning performance as much as one that doesn't have this limitation. Also, striping with parity often requires extra reads and writes to maintain the integrity of the parity information, as described here.