The first step in building a beowulf is studying the task you wish to use the beowulf to speed2.2. It really shouldn't be that surprising that the intended function dictates the optimal design, but newbies joining the beowulf list almost invariably get it wrong and begin asking ``What hardware should I buy?'' which sort of answers itself as the last step of this protocol. The One True Secret to building a successful beowulf, as recited over and over again on the beowulf list by virtually every ``expert'' on the list2.3 is to study your problem and code long and hard before shopping for hardware and putting together a plan for your beowulf.
This ``secret'' is not intended to minimize the importance of understanding the node and network hardware. Indeed, a large fraction of this book is devoted to helping you understand hardware performance issues so you can make sane, informed, cost-beneficial choices. However, it is impossible to estimate how hardware will perform on your code without studying your code, preferrably by running it on the hardware you are considering for your nodes. In later chapters, concrete examples of code will be given that run at very different relative speeds on a selection of the currently available hardware2.4.
The word ``study'' is used quite deliberately. It means to, if at all possible, use measurements and prototyping more than back-of-the-envelope estimates2.5. Measurements are far more valuable than any theoretical estimate, however well-informed. A small prototype can save you from all sorts of terrible mistakes, and when a ``successful'' prototype is finally built, it can can often be scaled up to the final size required2.6
The ``study your code'' formula above brings to mind a vision of a pocket-protector-loaded geek poring over line after line of program text on green and white lineprinter paper in a dark smoky room with a can of Jolt cola in one hand and a programmer's reference in the other. At least to Old Guys like me. However, this is not at all what I meant. I actually meant one to visualize a pocket-protector-loaded geek poring over line after line of program text in a smoke-free modern linux programming environment with minimally X (and a whole bunch of window panels and desktops), gcc and friends, a debugger or two, emacsoid editors, and/or a ddd-like integrated program environment, with a can of Jolt cola in one hand and the keyboard in the other. Real programmer geeks don't need a hardcopy language reference. That was what should have given it away.
The point is, that to quantitatively study your code you have to get serious with some of the software development tools that you may well have largely ignored before. I should also point out that even beyond just studying your code, you have to to study your task. Even if you have implemented your task in a perfectly straightforward piece of code that a lobotomized lunatic could read and understand, it may be poorly organized to run in a parallel environment. On the other hand, some horribly convoluted rearrangement of the code that you'd never in a million years write in a single-threaded environment (and that a non-lobotomized certified genius might have difficulty understanding) may be just great in a parallel computing environment.
I will now and henceforth assume that you know nothing about parallel code design or parallel task execution. Since I (truly) don't know that much more than nothing, I'm going to try to teach you what little I know, and where to learn more. Accept the fact that if you have a ``big'' project in mind, you will have to learn more. I mean it. Real Parallel Algorithms are the purview of Real Computer Scientists (where I am a ``Sears'' computer scientist at best2.7) and you'll need to find a book by a real computer scientist or two to learn about them. A number of such books are listed in the Bibliography and indicated in the text in context. Alternatively, you can hire a real computer scientist, if can get approval from your fire marshall and the local board of health2.8.
Once you have a linux-workstation set up to do the requisite study you can either design a program from scratch to be parallel (a great idea when possible) or, more likely, take an existing serial program and start to parallelize it. To parallelize the program and to inform the beowulf design process, you must begin by identifying how much time is being spent in a serial code description of the task doing work that could be done in parallel and how much time is being spend doing work that must be done serially. If the linux workstation you are working on is at all ``like'' what you think you might need for a node (after reading through this whole book) so much the better.
In all likelihood you have no idea how long it takes for your (or any) computer to do any of the work in your task. Neither do I. So we must find out. This is accomplished by profiling your task. The way to profile a simple serial task using Gnu tools (gcc and gprof) is illustrated in detail in a chapter below.
I'M WRITING RIGHT HERE - THE REST OF THIS CHAPTER IS IN TOTAL FLUX...
Task profiling is covered in a chapter below.
Use Amdahl's Law (covered in a chapter of its own) to determine whether or not there is any point in proceeding further. If not, quit. Your task(s) runs optimally on a single processor, and all you get to choose is which of the various single processors to buy. This book can still help (although it isn't its primary purpose) - check out the chapter on ``node hardware'' to learn about benchmarking numerical performance.