[ The PC Guide | Articles and Editorials | Benchmark This! ] The (Invalid) Benchmarking Phenomenon If you have spent any amount of time on the hardware-related Usenet newsgroups, you will see numerous benchmarks posted by users, usually to find out if their system compares favorably to others running similar configurations. Most of these users have little or no knowledge of what the programs are really testing, nor how to evaluate the results. This has caused a lot of confusion and misinformation to be spread. Users regularly post questions about why their system-level benchmark score was lower than someone else's when they both had the same CPU. These users are usually completely confused when told that their video card or hard drive (or other component) could be affecting the result. Computer hardware sites are becoming more prevalent, with many providing recommendations on what component is the "best of class", usually based upon the results of some limited benchmark testing. Most run a commonly used benchmark simply for the purpose of having a benchmark result to display. Usually these sites have not clearly identified exactly what they are trying to test, nor do they provide all of the pertinent details about the system being tested. This is like asking the question of which car is faster, then running them for a quarter mile and choosing a winner. If you wanted to know which one was faster on mountain roads, you cannot come to a reasonable conclusion because you have not run the right test to answer that question. There are even sites whose entire purpose is to gather user benchmark results, ostensibly so users can determine if their system is running optimally. Some of these sites do no verification of results, and only require a limited amount of information about the system that was tested. When you then consider that users may have differing levels of drivers and widely varying BIOS settings, you can easily see why making any kind of comparisons from the results can be somewhat misleading. The more professional benchmarking sites, such as SPEC, require detailed hardware/software information and have a validation process in an attempt to prevent fraudulent scores from being submitted. The most prevalent misuse of a benchmark score is the fascination so many have with Quake frame rate results. The Quake benchmark is the absolute best benchmark for testing the Quake performance of a system. It also is useful for determining performance of other games that use the Quake engine on a given machine. The problem is that the benchmark is also being used as a measure for floating point performance (which it is not designed to do), and even the overall "value" of a processor or system. It is also interesting to note that for 3D games, an important metric is the slowest frame rate because that determines the performance during the most graphics intensive parts of the application. The Quake benchmark does not provide this information, but instead gives only the average. It is fairly obvious that two systems could have the same average, yet have different ranges of frame rates. The net result would be that the system with the smallest variance between frame rates would be the one with the best performance in actual game play.
|