Layout for beowulf node kickstart file


This is basically a lightly annotated review of the associated beowulf node kickstart file. It is far from being a definitive review of kickstart. For proper kickstart documentation, look at the Kickstart HOWTO or the Official Red Hat Customization Guide available in the doc directory of any Red Hat distribution.

Most of the kickstart file linked above is pretty much self-explanatory. It installs a node, text style, with US keyboard support, no mouse, and a dhcp-configured network. The encrypted root password given is installed. The url (or ftp or nfs) path to the installation distribution is indicated (in the example case, the institutional (wide area net) RH installation server wanserver.mydomain.edu has an apache webserver running and an image of Red Hat Linux 7.3 for the i386 architecture exported at the given path point). The timezone is set, no X11 is installed, some parameters to tell it how to manage passwords are given. It will install its bootloader into the master boot record, trashing anything it finds there.

The next few fragments mostly dictate the disk layout (the firewall is, however, disabled in between, and any ipchains or iptables configuration will need to be set according to your site's needs in the %post scripts later). It will clear all partitions -- this does a total reinstall (and hence should NOT be run this way on a node containing data you wish to preserve). It then defines a 3 GB root ext3 partition, a swap space of the "recommended" size for the amount of memory it finds on the node, an ext3 partition called /disk1 that is 2 GB in size as "spare space", and then an ext3 partition called /xtmp that is basically the rest of the disk, useable as scratch space by all users (presuming permissions are later set appropriately by %post).

This layout is "reasonable". The beowulf image only uses about half the root partition and is pretty fat as is for a system that won't run X. Swap is there, but of course you should try to avoid running code so big that swap is ever used. Disk sizes on nodes are highly variable, but likely to be bigger than the 6-8 GB this layout might minimally occupy. However, the node disk could be tens of GB in size, so the /xtmp (user scratch) partition is told to grow to fill the rest of disk. A current 30-40 GB disk will therefore have 23-33 GB of pure user scratch space.

Following that is a list of packages, both grouped and individually. Package groups are defined in the comps file that typically resides in the RedHat/base subdirectory of your distribution archive. Groups allow one to easily assemble a collection of RPM packaged components and get them "all at once" with a single command.

The kickstart file thus begins its package list with the Dulug Beowulf group (defined in the comps file linked above). This group includes the X Window System group, the Network Support group, the Utilities group, the Emacs group, the Software Development group, and two other locally defined groups Dulug Stat/Math and Dulug Beowulf Misc. The former contains a variety of scientific, mathematical and statistical libraries (gsl, blas, lapack, R). The latter is the core that makes this a cluster node instead of "just" a workstation -- pvm, lam (mpi), xmlsysd/wulfstat.

In our LAN we make local adjustments on top of the Dulug Beowulf package include adding (for example) backup utilities, editors preferred in the local environment, xinetd, and removal of a variety of packages used primarily on servers or on workstations. Obviously the basic Dulug Beowulf group can be customized for your local environment directly, as can the list of RPM additions and deletions.

At the end of the kickstart file comes the %post section, which is where your cluster node (or workstation) is given all of its "local identity" that isn't passed by dhcp or prewrapped in locally built RPMs. wget is used to get a straightforward shell script, beowulf.sh that is run out of /tmp on the new node, redirecting any errors it generates into /tmp/post-errors.log for retrospective analysis (if something fails).

The particular script enclosed is intended only as an example -- this script will have to be heavily modified to suit your local environment unless it happens to be just like ours. It starts by extracting or setting certain pieces of information: the "type" of system (beowulf), ip number, hostname, RH version number and URL path to the directory that will provide and other files required in the initialization process.

The script starts by using chkconfig to set (or unset, which can be just as important) all of the daemons and cron entities and root processes it controls. It sets up yum so that it can find any missing, strictly local rpm's. Yum can work with multiple archives, making it easy to layer a few local RPMs on top of a package set established from an institution-wide server. This makes it easy for local managers to customize their installations without needing the time and energy or services of whoever maintains the global institutional installation server. Once this is done, local tools and configuration files can be unpacked and configured by just a yum install localpackagename.

Local nfs mounts are added to the prebuilt /etc/fstab created by the kickstart install. This particular beowulf.sh is intended for use on Tyan 2466 based systems, which have a serial console that runs on ttyS0. The BIOS, of course, cannot manage this once the system is booted, but by adding the given lines to /etc/inittab a post-boot serial console is established for any cluster node.

The VGA "splash" image used in grub is turned off (this presumes that your cluster nodes will be booting only in text mode, quite possibly from a serial console, and actually won't hurt anything even if a graphics adapter is present and you are working with keyboard and video monitor). The /etc/motd is blanked. Backup (via amanda) is set up. Some superfluous content is removed.

An important step is making /xtmp into a true "tmp" filesystem, setting permissions appropriately. This permits any user of the node to write whatever they like there. We generally do NOT back up this space. It is available to "buffer" input and output of node tasks, but the cluster user must arrange to collect any permanent results from these spaces and move them to permanent, backed up space (for example, an externally mounted project directory on a RAID attached to a cluster NFS server).

yum is used to install a few very local additions, including the promised RPM (phy-misc) that wraps up still ANOTHER layer of strictly LAN local files and settings, gcc3 (to update the default RH gcc, which is still 2.96), and a postinstall script that will run the first time the system is rebooted which does all last minute cleanup. A function is used to collect a bunch of locally defined files from the install server (or more likely, the LOCAL install server). Finally, a variety of small tasks (beowulf-jobs) are run to finish of the configuration. These aren't microscopically documented here as they are likely to be different in a different environment, but this script still illustrates how little cleanup jobs can be loaded and run.

The script concludes by getting the LAN-correct /etc/nsswitch.conf and installing it. This file is somewhat tricky and will have to be carefully built for your local environment. We use NIS for authentication and so forth, even on cluster nodes. For the embarrassingly parallel, sparse I/O tasks we tend to run on our cluster nodes, this is not a significant performance hit. For others it is, or may be, and they might well opt to use e.g. rsync or some other remote copy method to mirror a "master" /etc/passwd, /etc/group, /etc/hosts, and so forth onto all the nodes. This would then require an appropriate /etc/nsswitch to reflect that in lookup order.

As not infrequently occurs in beowulf-style cluster computing, there are tradeoffs to be considered in selecting the approach that is right for you. NIS is administratively centralized and fully automated, so all account and host management can be done on a single host and be "instantly" available on the entire LAN. It permits users to change their passwords (for example) "anywhere" and have the change appear "everywhere". However, it is relatively heavyweight, as NIS lookups occur basically every time a file is stat-ed and can significantly load the network and reduce performance if this is a common occurrence in your particular task mix. This issue is perennially discussed on the beowulf list, so a peek at the list archives should reveal insight and solutions should you face this dilemna.


This page was written and is maintained by Robert G. Brown rgb@phy.duke.edu