Types of Runs
The next step in defining the rules is to describe how the benchmark is to be run. The standard often used is one program or query at a time and then try adding processors to check the scalability. That is fine, but does that really simulate your compute environment? Does that make sense if each system only uses one or two processors and you plan on purchasing eight, or 24 or 72 or 102 processors where many programs are running in production? A throughput test may give better answers to your basic questions.
Throughput benchmarks vary depending on your compute environment. Get creative, but make it realistic. Is running the same exact query 20 times simultaneously really how your compute environment used? You need to set a goal, determine the components, and state how the components are started.
First, define a goal for the throughput test. Is the winner the one who completes the most number of runs within a specified time limit? Or is it the one who completes the suite of runs fastest?
Next, determine what the components of the throughput run are. How many processors should each component use? How many times should each component be run? If one of the components has a run time that is a lot longer than all of the others, the time of the throughout run will be the same as that long run. That does not demonstrate much about the computer under a load.
Now, how do you want the components started? Should they be started all at once or staggered? Do you want use a resource management system like Sun ONE™ Grid Engine software or Veridian's Portable Batch System or just let the operating system manage the scheduling? With a resource management system, the computer vendor will set it to run so that the processors are not oversubscribed. With the operating system doing the scheduling, the processors will be over-utilized, if more processors are needed than are available. How is this managed in your production environment?
Do you want the elapsed time for each component recorded? Are you interested in the system-wide CPU utilization, disk usage, and so forth for the duration of the throughput test? State what information you want so you get the same information from all the computer vendors.
What about vendors extrapolating to 'future' systems?