next up previous contents
Next: Local management-remote site Up: Cluster Siting Previous: Cluster Siting   Contents

Local management-local site

In a location that already contains an operational LAN, whose users include the owner/operators of a proposed cluster, almost all of the costs of installing and managing the cluster are already incorporated in the irreducible costs of setting up and running the LAN itself and supporting the cluster owner/operators as LAN users. Those users already have accounts, servers, desktop clients, managed filespace, security, authentication, software support, and routine maintenance provided by the local systems staff, and if that staff is competent most of those costs will not change tremendously if some network components are compute nodes instead of desktop LAN clients.

Exceptions (items which ARE additional costs for running compute nodes) are the cost of installing the compute nodes themselves, the cost of installing and maintaining any cluster-specific software, and the additional cost of supporting users with cluster-specific problems. There is a difficult-to-estimate cost associated with ``FTE boundaries'' that nonlinearly kicks in whenever the local manager(s) are pushed, because of the additional nodes OR workstation clients OR servers OR users past their capacity boundaries. If they're already working at absolutely full capacity without the cluster nodes, adding the cluster nodes will cost ``more'' than one expects from additions within their capacity. In addition, there are the more or less standard infrastructure costs of roughly $1 per watt per year plus the more esoteric costs associated with providing the space and networking required by the nodes.

Of these, the latter are for all practical purposes the same regardless of where and how the nodes are situated. There is generally no reason to expect a priori that space, power, and cooling will be more expensive in a nearby location (inside the same department) than far away. In specific cases it may be more expensive; in other cases it may be cheaper. We will therefore ignore the infrastructure costs altogether as being roughly equal for any physical siting of nodes, remembering that for any specific proposed space, we will need to reexamine the situation and see if that space is anomalously expensive or inexpensive relative to alternatives.

Thus the additional per node costs for a locally managed cluster in any environment where the cluster nodes themselves don't push the managers across a capacity boundary come down to the twenty or thirty minutes it takes to install the node plus the twenty or thirty minutes it takes to do all per-node management of the node in a typical year (including all hardware repair, software installation, software update). An FTE hour is a fairly safe upper bound on the yearly cost, per node, of running a standard cluster fully integrated with an existing LAN where the cluster owner/operators already have accounts and access to all the LAN resources.

There are some additional one-time costs associated with running any given cluster. Perhaps a cluster of 16 nodes requires a special compiler that it takes eight FTE hours to purchase and install (over several weeks) and later configure and support at the user end of things. Perhaps a cluster of 16 nodes has its own server, requiring an extra hour or two of setup time and a few minutes a week worth of attention to a backup device. It is reasonable to expect a cluster to take an FTE day or two outside the per-node costs to take care of this sort of thing. Altogether, however, a node will typically cost less, per cpu, than a LAN desktop even excluding the support of the humans that might use the desktop.

In the local management, local site model, absolutely maximal economy of scale is obtained except near FTE capacity boundaries. The system managers are on hand in the premises to take care of cluster nodes, so it doesn't significantly reduce their responsiveness to departmental LAN problems when they work on them. It takes a few minutes to walk to the cluster/server room to perform physical operations and maintenance instead of as long as an hour. Many physical operations can often be bundled into a single trip. Managers have maximal flexibility in choosing when to work on nodes and when to work on the LAN in general. The LAN management aspects of running the cluster are largely inherited from the LAN they are already running and not cluster specific.

It is hard to beat this model, which is why it is so popular and the obvious cluster model of choice where it works. The marginal cost per node for management is on the order of 1-2 FTE hours per year, or a real dollar cost less than $100 per year even for fairly skilled and well-paid managers. Better yet, as long as the cluster doesn't cross the FTE capacity boundary for the department, this 1-2 FTE hours per year per node is free - opportunity cost absorbed into the irreducible cost of the systems managers already working for you. You just use a larger fraction of the capacity you are already paying them for.


next up previous contents
Next: Local management-remote site Up: Cluster Siting Previous: Cluster Siting   Contents
Robert G. Brown 2003-06-02