This is a new and improved outline for this paper. I) Rough introduction, Amdahl's Law, improved versions. Skim past discussion of communication patterns and algorithms (refer to online references). II) Discuss/Define Primary Bottlenecks: Latency vs. Bandwidth. Indicate how individual programs may be rate limited by any or all of these. Also mention superlinear speedup, when one redesigns a program to live within some significant boundary. III) Then in order:CPU. Memory (L1/L2/main). Network (IPC channel) -- raw (socket) and cooked (in PVM/MPI application). Disk. Very short. Just define and indicate the primary rate defining features. IV) Engineering tools to determine key latencies/bottlenecks. lmbench (under development, but a "complete" suite). Tools in "beobench". Homemade tools. V) Profiling an application and then parallelizing it (predicting overall performance). VI) Conclusion: The >>best<< benchmark is always your application. Complexity matters, and only rarely can one completely predict performance on the basis of benchmark (micro or macro) numbers. This MAY be too much. Then again, it may not. It's worth trying.