Email: rgb@phy.duke.edu
URL: http://www.phy.duke.edu/rgb/
Phones:
work: 919-660-2567
cell: 919-280-8443
24 hr. Fax: 919-660-2525
Brown has been a pre-major advisor for the Trinity School of Arts and Sciences since 1995.
Many of these columns have been republished and are available online
at:
http://www.clustermonkey.net/content/category/5/14/32/
The beauty of cluster computing is that it requires little more than a generic workstation LAN to do it. We begin to explore cluster computing with just that: a "Network of Workstations" (NOW) that you may well already have!
Last month we started out by learning how to use pretty much an arbitrary linux LAN as the simplest sort of parallel compute cluster. This month we continue our hands-on approach to learning about clusters and play with our archetypical parallel task on our starter cluster to learn when it runs efficiently and just as important, when it runs inefficiently.
Clustering seems almost too good to be true. If you have work that needs to be done in a hurry, buy ten systems and get done in a tenth of the time. If only it worked with kids and the dishes. Alas, kids and dishes or cluster nodes and tasks, linear speedup on a divvied up task is too good to be true, according to Amdahl's Law, which strictly limits the speedup your cluster can hope to achieve.
The idea of a homemade parallel supercomputer predates the actual Beowulf project by years if not decades. In this column (and the next), we explore "the" message passing library that began it all and learn some important lessons that extend our knowledge of parallelism and scaling.
In this column we continue our exploration of PVM, the parallel computing subroutine library that more or less enabled the current explosion of high-performance parallel compute clusters to happen.
In this column we write and run a very simple PVM application to "get started" with PVM.
Location, location, location. Clusters need space, power, cooling, and network access.
We discuss some very basic principles for how to go about picking the best hardware for your cluster.
From inexpensive 100 Base T Ethernet to expensive custom networks, the network is the glue that makes a cluster.
Wrap your data in TCP, pop it into an IP datagram, and insert it into an ethernet envelope...
Top 500 or Gordon Bell? Cost-benefit, not raw performance, is what cluster computing is all about.
Wrap your data in TCP, pop it into an IP datagram, and insert it into an ethernet envelope...continued and with UDP thrown in for good measure.
The critical component of a beowulf cluster is the network. How can we compare network performance across a dazzling array of choices?
Now that we understand a bit about networking, we return to cluster design.
Random thoughts about clusters, column names, and Linux versus Windows in my home cluster/LAN.
When the only work being ranked is driving nails, the only tool that is valued is the hammer. Too bad if your work involves driving screws...
Competition is good, but a single measure of performance in one dimension is not terribly useful for optimizing in a multidimensional space. We can do better.
So you've built that new cluster, for fun or eventual profit, but had no specific task in mind. You want to test it out. But how?
What if we made a benchmark daemon a built-in component of standard Linux? Tools with a library interface could optimize in many useful ways, and automagic resource aware cluster schedulers would finally become possible...
The following are selected web-only publications by Robert G. Brown on
the website:
http://www.phy.duke.edu/rgb
This website gets over 6 million hits a year from users downloading 66 gigabytes of online content authored by Brown ranging from free physics lecture notes and online textbooks to computing information and poetry.
Dieharder is a fully GPL random number generator tester, under development by Brown. It currently incorporates all of the tests from George Marsaglia's Diehard tester, several tests from the NIST Statistical Test Suite (with more on the way), and a number of tests devised by Brown.
Dieharder is in active use by an increasing number of research groups because it subjects random number generators to far more strenuous tests (with user-adjustable parameters that permit the user to determine the power of the test) than previous generators. A community is developing that is contributing ideas and code and helping to debug the tool. Dieharder is available in a linkable library and has been incorporated directly into the R statistical suite by Dirk Eddelbeutel. By virtue of its power, Dieharder is serving as a test of its own code - possible weaknesses in two Diehard routines have been revealed by it.
Dieharder is available from:
http://www.phy.duke.edu/~rgb/General/dieharder.php
Wulfware is a collection of several tools (xmlsysd, libwulf, wulfstat, wulflogger) designed to support the monitoring of clusters and grids. xmlsysd is a lightweight daemon that provides xml-wrapped system statistics and other information extracted from /proc and various systems calls. wulfstat and wulflogger are ncurses and straight ascii (respectively) tools for connecting to the xmlsysd daemons running on an entire cluster and either presenting it with a user-selectable refresh delay in a tty (xterm) window or printing it in a simple column format to standard out where it can easily be fed to a log file for eventual plotting or to other tools (e.g. a builder of a web view of the data). This is of obvious and immediate use for monitoring cluster status, tracking particular jobs, determining resource utilization for gridware schedulers or policy engines.
Wulfware is available from:
http://www.phy.duke.edu/~rgb/General/wulfware.php
benchmaster is a microbenchmark program designed to time and test system performance at a low level. It will eventually be added to the wulfware suite as a component of xmlbenchd, a new project that provides a daemon interface to xml-wrapped drop-in benchmark programs so that applications can be built that can automatically tune their algorithms to the particular hardware they are running on and so that grid tools can be built that can dynamically determine the resources available on an anonymous grid node.
Benchmaster is available from:
http://www.phy.duke.edu/~rgb/General/benchmaster.php
flashcard is a program for presenting simple flashcards to students in a standard terminal (e.g. xterm) window. Special features include an xml encoding of flashcard problems and the ability to present auditory cues (e.g. spelling words out loud) from compressed sound files.
Flashcard is available from:
http://www.phy.duke.edu/~rgb/General/flashcard.php
http://www.phy.duke.edu/brahma/
R. G. Brown, together with Dave Rahul of the University of Pennsylvania, organized the Extreme Linux section of the 1999 Linux Expo, which focused considerable attention on the beowulf effort and the possibilities of COTS parallel supercomputing. R. G. Brown was selected to be on the organizing committee of the ``IEEE International Symposium on Cluster Computing and the Grid'' (CCGrid'2000), held in Brisbane, Australia in May, 2001, and was on the program committee of ``The 2005 International Conference on Parallel Processing (ICPP-05)'', held at the University of Oslo, Norway June 14-17, 2005.