next up previous contents
Next: Physical Infrastructure Up: A Model for Cluster Previous: The Standard Node Approach   Contents

User Support

If the University adopts the acpub general LAN configuration as suggested above, the only non-recovered costs for managing public clusters will be a very limited amount of one-time work developing cluster images consistent with the acpub LAN, some work on support-level infrastructure for the managed clusters (building and managing cluster servers and so forth, which will likely not always be paid for explicitly by node purchasers) and the extra human work required to support the cluster users.

If the University adopts a mixed model for user support such as the acpub help desk plus mailing lists and websites that range from campus-specific resources such as dulug and dbug (the Duke Beowulf User's Group) to international Internet-wide resources such as the beowulf list, the many beowulf websites and resource centers, the many linux and gnu lists and websites, and even to internal or external paid consultants (paid by the uesr) for particularly thorny problems. Using this mixed model and leveraging existing resources, even the highly variable cost of providing user support can largely be controlled. This support typically comes in two forms (motivating the mixed model). On the one hand there are requests for routine LAN services such as setting up accounts, arranging for resource access, or requesting help with more or less routine linux software, which the existing acpub help desk can manage at little to no additional marginal cost (presuming that the existing acpub staff is augmented by 1-3 cluster specialists working semi-independently on managing and running the public compute clusters).

The other kind of support needed is cluster-specific support for how to use a compute cluster - how to write parallel code, distribute embarrassingly parallel jobs efficiently, how to collect results locally at the nodes and retrieve them to the user's home LAN environment as needed, how to enable secure ssh access to node/cluster resources across the campus WAN. This latter need for support is much better provided at the user-group level, where all the cluster users and cluster administrators on campus pool their collective expertise and help out the less-experienced users, where one expert can offer up complex problems for input from other experts.

This distributed model of support is ``the'' model for virtually all of linux and open source software, and has proven to be shockingly effective at facilitating learning and development. It forms the critical information-exchange step in an ongoing process of genetic optimization that is responsible for the birth of the Internet itself and which continues to drive a staggering range of technological development today. It is also very nearly a model for ``free'' support. Most of the actual support (again, based on years of experience participating actively in many of these lists) comes from people who provide this support or are heavily involved in cluster or systems management anyway; the service is provided out of opportunity cost labor by employees and other members of the University community that are in some measure already being paid to provide this sort of service, or who are at least professionally engaged in some measure with cluster construction or operation.

It is important to remember that this sort of distributed support is not really free and that it is a highly valuable contribution to the University community, whereever it comes from. Although it is not generally necessary to explicitly pay for this kind of support with a line item charge, toplevel IT managers across the University need to be aware of the time being contributed to cluster support, both to be able to recognize the contribution when evaluating employees' performance and advancement and to be sensitive to the need to augment staff when participation pushes any given employee or employee group to an FTE boundary (where they no longer have ``spare time'' in which to answer cluster-related questions).

Since some of this distributed support will be provided by faculty, postdocs, and other non-staff employees, their contributions also need to be recognized and rewarded somehow. Cluster experts throughout the University are an important community resource and enable research to be done and grants to be obtained far beyond the boundaries of their particular departmental mileau.


next up previous contents
Next: Physical Infrastructure Up: A Model for Cluster Previous: The Standard Node Approach   Contents
Robert G. Brown 2003-06-02