Benchmark Limitations

[ The PC Guide | Articles and Editorials | Benchmark This! ]

Benchmark Limitations

The Quake example shows the limitations of benchmarks that provide only a single numerical value for the result. While such a number can be useful, too many questions are left unanswered making a complete analysis impossible. Most benchmarks will provide a single value as a final score, but most will also provide the scores of the individual tests that make up the benchmark.

Most of the popular review sites have chosen one benchmark suite for Windows 95, one for Windows NT and one or two specific applications for games performance--typically the most popular and easily obtained. This is likely because of the time and expense involved in more exhaustive testing. When they do report their results, they are almost always reporting only the average score, which they then use to make sweeping conclusions about the capabilities of each component tested. In addition, these sites regularly compare Slot 1 and Socket 7 processors, even though there is virtually no way to isolate them using the benchmarks being employed.

To illustrate this point, the following exerpts are taken from Intel's iCOMP Index 2.0 report:

"...Intel's goal was to derive a single number for each processor that would describe the processor's highest possible performance on the desktop, and which would still be as independent as possible of system features"

"It is also important to understand that the iCOMP index is a tool for making comparisons between different Intel processors, not systems... Thus, although two systems with a given processor will have exactly the same iCOMP 2.0 rating, this does not mean that all systems with the same processor perform the same--differences in system design and configuration will affect performance considerably."

The important point here is that differences in L1 and L2 cache size, L2 cache speed, memory bus architecture, and I/O bus architecture will greatly affect the results of these tests. Since most of these differences relate to the motherboards, and it is impossible to test a Slot 1 and Socket 7 processor on the same motherboard, these are not really comparing processors. At best, they are comparing the two platforms and at worst they are making no valid comparisons at all.

It could be argued that since the motherboard and processor are so closely tied together that this is a valid comparison. This would be true if the intent was to compare the overall system performance, without making any judgements about how the processors compare. Unfortunately, too many reviewers and users are using these results to claim that Intel's processors are better (or worse) than AMD's processors. Given the circumstances, the major differences may well be the motherboard and cache more than the processors themselves, but in most cases there is not sufficient data provided to make this determination.

Sometimes, even differences between two processors that run on the same motherboard can affect results. For example, ZDBOp's CPUMark32 shows the Celeron as a much slower processor than the equivalent Pentium II. As it turns out, this is because of the difference in cache size. CPUMark32 was written to fit within a 256K L2 cache, but it overflows the 128K cache of the Celeron. CPUMark99 was apparently released recently to address this problem. SPEC98 shows the opposite situation, where it intentionally overflows any size L2 cache, apparently to prevent the relatively large Alpha cache sizes (up to 8MB) from dominating the benchmark's top scores.

Next: Knowledge Is Power

Home - Search - Topics - Up

Not responsible for any loss resulting from the use of this site.
Please read the Site Guide before using this material.