Some notes (past deeds and future plans)

  Fixed MAJOR bugs associated with timer and with Intel's bedamned
  divide-by-powers-of-two optimization, which gave a completely false
  picture of the relative strength of its processor.  AMD too.  Many
  thanks to Sebastien Cabaniols of the COMPAQ European HPTC Solution
  Center for pointing out this problem.

  Added transcendental test which is a variant of the savage benchmark.

  Next, we should consider abstracting the timer even more and then
  adding more tests.

OK, it is time (08/02/01) to proceed with some of the stuff below.
However, plans have changed a bit.  

For one thing I'm going to add some stream-like tests, as I don't see
how testing/benchmarking vector multiply/add instructions can be
considered a proprietary idea even in the most polite sense of the term.
I was doing it literally a decade ago myself anyway.

Second, I'm going to completely swallow "memtest" into cpu-rate.  In
particular, I'm going to modularize and break down the parts of memtest
and provide a facility for inserting any of a collection of specific
combinations of float instructions.

I'm especially interested in two numbers:

   a) Floating point speeds in the regime where cache is useless --
where the random memory access algorithm pulls the next number to be
worked on from all over the malloc'd memory space.

   b) Floating point speeds in the regime where more arithmetic is
gradually added inside the core memory access loop.  I'm particularly
interested in being able to identify the "knee" in the curve that should
occur when floating point operations unbind the memory bus, for single
and dual operations.  There should be a fairly well-defined "number of
floats per memory access" that unbinds the memory bus on both singles
and duals and restores one to something much closer to theoretical peak
float rates.

These additions may require some moderate rewriting of the entire
program (again).  For one thing, we are going to get to the point where
it is pretty difficult to control/select benchmarks using command line
flags.  We can try it, but I'm going to guess that I'll end up with too
many control flags to remember or easily document.  It may be time to
use an input file to control what is done and start it with e.g.

  cpu-rate -f benchfile

to configure and execute a specific benchmark. Alternatively we can try
to use flags but simultaneously "document" them (and control execution)
with a perl-gtk GUI.

This latter solution is moderately attractive for two reasons.  One is
that it autodocuments everything.  Second, it is necessary to run a
number of "independent" runs FROM THE SHELL to get good statistics.
Shelling the binary benchmark from a script and doing statistics at the
shell level is a good way of managing this.

(Old stuff below)

  In particular, we should add fractionated/broken down test for e.g.

   multiply
   add
   subtract
   divide

  as separate entities, and tests for

   sin()
   cos()
   tan()
   asin()
   acos()
   atan()
   pow()
   sqrt()
   exp()
   ln()

  also as separate entities.

  It would be good to merge in the old memtest program so that we could
  mix'n'match "raw" float speed tests and "raw" memory I/O speed tests.


