This is a system timer that can be used for timing "microscopic"
operation combinations.  It is NOT DESIGNED OR INTENDED TO BE USED AS A
COMPARATIVE "BENCHMARK".  It is rather designed to measure and return
some very fundamental rates at which a CPU (being directed by a
particular compiler and set of libraries, of course) accomplishes
particular tasks.  It is unique in that it is moderately extensible --
it is fairly easy to add your own code fragments to be time as follows:

  a) Copy e.g. null_time.c and null_time.h or multiply.c and multiply.h
into suitably named files for your test, e.g. mytest.c and mytest.h

  b) Edit the Makefile and add mytest.c to the list of source modules as
shown

  c) Edit tests.h and add a capitalized MYTEST, to the enumerated list,
and add #include "mytest.h" to the list of includes

  d) Edit benchmaster_startup.c and add mytest_init(&test[MYTEST]); to the
end of the list of initialization calls.

  e) Edit mytest.c and add your code fragment to the /* Full */ branch
of the conditional in test.  Change all instances of the word (e.g.)
null to mytest.  If you want to normalize the time in any way (look at
other modules for examples) add suitable code to mytest_results().  Edit
the lines in mytest_init() that name the test and that describe the
test.

  f) Edit mytest.h and change all instances of null to mytest.

  g) That's it!  Remake it (enter "make") and run "benchmaster -l" to list
all tests and mytest should appear at the end of the list with a number,
such as 16.

Running

   benchmaster -t 16

will then time your fragment!

Note that xtest and ytest are provided for arithmetic, results is
provided to hold results.  Be warned!  The compiler optimizes away naive
combinations of simple arithmetic!  It is very tricky to time very short
arithmetic combinations (on a nanosecond scale) reliably and
reproducibly -- it generally works much better to execute something many
times even inside the core empty and full loops (that are themselves
executed many times to get reliable statistics).

WARNING: This tool can return flaky results with certain versions of gcc
(and probably other compilers/libraries as well).  gcc's gettimeofday
isn't terribly reliable for small/fast operations except in aggregation.
Also, gcc has had a legendary problem with variable alignment that can
lead to factors of two or more variation in measured "speed" depending
on crazy things (like the length of the command line argument or your
particular compile).

WARNING WARNING:  Some hardware hyperoptimizes certain operations (like
division by a power of two) and will return extraordinary -- and "false"
-- results if x is set to those values.  Intel and AMD both do this as
of the time of this writing.  The default value for x (PI) should return
"reasonable" general purpose timings.

WARNING WARNING WARNING:  The results returned by this program are BOGUS
and may mean nothing whatsoever.  However, they may be useful just the
same...

The PURPOSE of this program is to inform a variety of decisions with a
set of sound and reproducible measurements.  Some of these decisions are
human decisions (Which system should I buy to use as a beowulf node?
Which system makes more sense as a desktop?).  Some of them might well
be automated software decisions -- a spanning set of CPU (and other)
rates, stored in a publically accessible file on all systems in a
cluster or beowulf, allows software to be written that reads and parses
that file to obtain the rates, and uses the rates to automatically
>>tune itself to the system in question<<.

This latter use, in particular, has great promise in enabling
intelligent software design.  On a beowulf or compute cluster:

   a) Waiting nodes are wasted time.
   b) In time, all clusters become heterogeneous (if they aren't from
the beginning).  Lacking precise information on node speeds, it is
difficult to design parallel programs that scale their size in such a
way that nodes don't have to wait (as they would with the usual equal
partitioning of the problem).

On a given node or standalone workstation:

   a) Operations like linear algebra can be optimized (in both algorithm
and stride) for the sizes and relative speeds of the CPU and the various
memory layers.  This has been beautifully demonstrated by the ATLAS
project.  However, doing a customized optimized build of ATLAS is quite
difficult and, given accurate measurements of certain microscopic rates
and latencies and bandwidths, should not be necessary.

   b) Operations like sort can similarly be tuned. Indeed, >>many<<
large and complex operations that are common elements of many programs
can be automatically optimized given direct access to accurate
measurements of various system rates and parameters.

This tool is intended to evolve into a means of providing those rates.
It is not finished -- it can never be finished while CPU's and systems
themselves continue to develop.  It can also almost certainly be
improved.  If you are a systems programming expert, please feel free to
improve it or criticize it.

The tool DOES need a certain degree of stability in order to be useful.
As code fragments are created (and integrated with the timing harness)
that return a "useful" parametric measure those fragments will be
"frozen" unless someone can demonstrate that they are failing of their
purpose.  Although the code is GPL (so you can hack it all you want, or
embed it in other programs, and all that, provided you follow the GPL
rules) the code will obviously cease to be useful if someone
significantly hacks the source that measures, say, sin() execution and
cos() execution separately and replaces it with code that calls sincos()
and (obviously) speeds up "sine and cosine execution".  Or something more
subtle -- for example reordering operations by hand to take advantage of
"accidental" speedups on a given CPU.

I therefore say "please don't do this".  Since my saying it has no legal
force, I >>also<< say unto all humans everywhere -- DON'T BELIEVE
PUBLISHED NUMBERS GENERATED FROM THIS PROGRAM unless you trust the
source.  In a published paper written by a university computer
scientist, they are likely trustworthy.  On a website run by Your
Favorite Computer Company, they are likely not.  The safest thing to do
is to run them yourselves from trustworthy sources or RPM's, which is
always possible to do with free software and the web.

You can get the current copy of this program (at this point in time,
anyway) on http://www.phy.duke.edu/brahma (follow the links).
Eventually I'll likely give it a page of its own, if there is any
interest in it having one.


   Good Luck,

         rgb

Robert G. Brown	
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  
Fax: 919-660-2525  
Email: rgb@phy.duke.edu
URL: http://www.phy.duke.edu/~rgb
