next up previous contents
Next: The Standard Node Approach Up: Centralized Cluster Installation and Previous: The DULUG linux installation   Contents

Cluster nodes

Workstations tend to be extremely heterogeneous in their hardware configuration. The incredibly wide variety of over-the-counter motherboards, cases, CPUs, memory options, sound options, video options, storage options, and even keyboard and mouse options, is wonderful from the consumer point of view and keeps prices low, but is a real headache from the systems support point of view. At this point, linux supports most PC hardware fully automatically in a kickstart install, but certain e.g. video cards do require some options to be specified or tweaking to be done on a per-system basis for a workstation.

Heterogeneity thus adds to LAN management costs. Sometimes (when a particularly troublesome piece of hardware is encountered) this cost is significant. LAN managers encourage users to purchase systems with ``approved'' hardware (hardware known to be both supported and relatively headache free) to control this cost. As long as hardware from this list is used, a single kickstart file will typically suffice to use for any LAN workstation and costs are minimized.

The same is true for cluster nodes, which can be viewed as specialized, particularly simple workstations. As long as cluster nodes are engineered so that all components are well supported (ideally in a tested and trusted configuration) a single kickstart file can suffice to support all the cluster nodes in a LAN. Even as nodes with new hardware are added over a period of years, it is simple to make small changes in copies of the basic kickstart installation and support several generations of node at once.

The basic cluster node configuration (and hence kickstart file) is unlikely to change signficantly between clusters or departmental LANs. A basic cluster kickstart file (such as the one used in physics for cluster nodes) can thus be shared at the institutional level, with obvious LAN specific modifications. Cluster nodes can thus be installed for a very low average cost as long as hardware and configurational heterogeneity are avoided. Cluster hardware can be prototyped at the institutional level to ensure that nodes are scalably installable and maintainable.

From this we see that cluster nodes can be installed and their software configuration maintained, at the institutional level, for very low cost per node. If one utilizes acpub's institution-wide LAN configuration as the LAN basis, and one sticks to ``proven'' node hardware configurations, and use a standardized, widely shared ``cluster node'' kickstart configuration, one pays a fixed cost for integrating these elements for the first cluster, a small fixed cost for additional cluster-specific servers or hardware, and then a marginal cost of a few minutes per cluster node for installation and software maintenance per year. The only uncontrolled costs remaining are user support (which can be highly variable, depending on the skill level of the user and the complexity of their task) and hardware support.

Because the scaled (per node) costs can be so precisely estimated, it is possible to create a simple model for cost recovery on cluster node hardware, installation, and basic software maintenance that is likely to pass muster from the vast majority of granting agencies. It is left as an open question as to whether or not to attempt to recover physical infrastructure costs (the estimated $1 per year per watt) for running cluster nodes. It is recommended that the University not attempt to recover costs for the basic LAN services and user support required, and instead view them in exactly the same way that acpub services are viewed now (and indeed, offer them as an extension of acpub's basic services).

It is my personal belief that this model will prove to be satisfactory and even attractive to virtually any researcher interested in doing cluster computing who is in an environment that simply will not support the local management model (that is still cheaper and more desireable). The model itself is described next.


next up previous contents
Next: The Standard Node Approach Up: Centralized Cluster Installation and Previous: The DULUG linux installation   Contents
Robert G. Brown 2003-04-03