This is a CPU rate determination program.  It is NOT DESIGNED OR
INTENDED TO BE USED AS A "BENCHMARK".  It is rather designed to measure
and return some very fundamental rates at which a CPU (being directed by
a particular compiler and set of libraries, of course) accomplishes
particular tasks.  The rates it currently measures (however crudely) are
double and single precision floating point rates in aggregate and a
composite transcendental function rate (the "savage" rate).

WARNING: This tool can return flaky results with certain versions of gcc
(and probably other compilers/libraries as well).  gcc's gettimeofday
isn't terribly reliable for small/fast operations except in aggregation.
Also, gcc has had a legendary problem with variable alignment that can
lead to factors of two or more variation in measured "speed" depending
on crazy things (like the length of the command line argument).

WARNING WARNING:  Some hardware hyperoptimizes certain operations (like
division by a power of two) and will return extraordinary -- and "false"
-- results if x is set to those values.  Intel and AMD both do this as
of the time of this writing.  The default value for x (PI) should return
"reasonable" general purpose timings.

The PURPOSE of this program is to inform a variety of decisions with a
set of sound and reproducible measurements.  Some of these decisions are
human decisions (Which system should I buy to use as a beowulf node?
Which system makes more sense as a desktop?).  Some of them might well
be automated software decisions -- a spanning set of CPU (and other)
rates, stored in a publically accessible file on all systems in a
cluster or beowulf, allows software to be written that reads and parses
that file to obtain the rates, and uses the rates to automatically
>>tune itself to the system in question<<.

This latter use, in particular, has great promise in enabling
intelligent software design.  On a beowulf or compute cluster:

   a) Waiting nodes are wasted time.
   b) In time, all clusters become heterogeneous (if they aren't from
the beginning).  Lacking precise information on node speeds, it is
difficult to design parallel programs that scale their size in such a
way that nodes don't have to wait (as they would with the usual equal
partitioning of the problem).

On a given node or standalone workstation:

   a) Operations like linear algebra can be optimized (in both algorithm
and stride) for the sizes and relative speeds of the CPU and the various
memory layers.  This has been beautifully demonstrated by the ATLAS
project.  However, doing a customized optimized build of ATLAS is quite
difficult and, given accurate measurements of certain microscopic rates
and latencies and bandwidths, should not be necessary.

   b) Operations like sort can similarly be tuned. Indeed, >>many<<
large and complex operations that are common elements of many programs
can be automatically optimized given direct access to accurate
measurements of various system rates and parameters.

This tool is intended to evolve into a means of providing those rates.
It is not finished -- it can never be finished while CPU's and systems
themselves continue to develop.  It can also almost certainly be
improved.  If you are a systems programming expert, please feel free to
improve it or criticize it.

The tool DOES need a certain degree of stability in order to be useful.
As code fragments are created (and integrated with the timing harness)
that return a "useful" parametric measure those fragments will be
"frozen" unless someone can demonstrate that they are failing of their
purpose.  Although the code is GPL (so you can hack it all you want, or
embed it in other programs, and all that, provided you follow the GPL
rules) the code will obviously cease to be useful if someone
significantly hacks the source that measures, say, sin() execution and
cos() execution separately and replaces it with code that calls sincos()
and (obviously) speeds up "sine and cosine execution".  Or something more
subtle -- for example reordering operations by hand to take advantage of
"accidental" speedups on a given CPU.

I therefore say "please don't do this".  Since my saying it has no legal
force, I >>also<< say unto all humans everywhere -- DON'T BELIEVE
PUBLISHED NUMBERS GENERATED FROM THIS PROGRAM unless you trust the
source.  In a published paper written by a university computer
scientist, they are likely trustworthy.  On a website run by Your
Favorite Computer Company, they are likely not.  The safest thing to do
is to run them yourselves from trustworthy sources or RPM's, which is
always possible to do with free software and the web.

You can get the current copy of this program (at this point in time,
anyway) on http://www.phy.duke.edu/brahma (follow the links).
Eventually I'll likely give it a page of its own, if there is any
interest in it having one.


   Good Luck,

         rgb

Robert G. Brown	
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  
Fax: 919-660-2525  
Email: rgb@phy.duke.edu
URL: http://www.phy.duke.edu/~rgb
