[ The PC Guide | Articles and Editorials | Benchmark This! ] Benchmark Limitations The Quake example shows the limitations of benchmarks that provide only a single numerical value for the result. While such a number can be useful, too many questions are left unanswered making a complete analysis impossible. Most benchmarks will provide a single value as a final score, but most will also provide the scores of the individual tests that make up the benchmark. Most of the popular review sites have chosen one benchmark suite for Windows 95, one for Windows NT and one or two specific applications for games performance--typically the most popular and easily obtained. This is likely because of the time and expense involved in more exhaustive testing. When they do report their results, they are almost always reporting only the average score, which they then use to make sweeping conclusions about the capabilities of each component tested. In addition, these sites regularly compare Slot 1 and Socket 7 processors, even though there is virtually no way to isolate them using the benchmarks being employed. To illustrate this point, the following exerpts are taken from Intel's iCOMP Index 2.0 report:
The important point here is that differences in L1 and L2 cache size, L2 cache speed, memory bus architecture, and I/O bus architecture will greatly affect the results of these tests. Since most of these differences relate to the motherboards, and it is impossible to test a Slot 1 and Socket 7 processor on the same motherboard, these are not really comparing processors. At best, they are comparing the two platforms and at worst they are making no valid comparisons at all. It could be argued that since the motherboard and processor are so closely tied together that this is a valid comparison. This would be true if the intent was to compare the overall system performance, without making any judgements about how the processors compare. Unfortunately, too many reviewers and users are using these results to claim that Intel's processors are better (or worse) than AMD's processors. Given the circumstances, the major differences may well be the motherboard and cache more than the processors themselves, but in most cases there is not sufficient data provided to make this determination. Sometimes, even differences between two processors that run on the same motherboard can affect results. For example, ZDBOp's CPUMark32 shows the Celeron as a much slower processor than the equivalent Pentium II. As it turns out, this is because of the difference in cache size. CPUMark32 was written to fit within a 256K L2 cache, but it overflows the 128K cache of the Celeron. CPUMark99 was apparently released recently to address this problem. SPEC98 shows the opposite situation, where it intentionally overflows any size L2 cache, apparently to prevent the relatively large Alpha cache sizes (up to 8MB) from dominating the benchmark's top scores.
|