Sun PFS Software
The Sun PFS software is a parallel file system designed to support the parallel I/O found in HPC programs. It is also transparently supported by MPI I/O. Refer to the Sun HPC ClusterTools 4 documentation for more information.
Using the sudo Utility to Configure Non-Superuser Privileges
The administration of the Sun HPC ClusterTools 4 software should be minimal after a successful installation. There are, however, a few best practices that should be followed to help ease the HPC administration task. Configuring non-superuser (root) privileges is one of these practices.
System administrators can allow non-superuser users who are HPC knowledgeable and who are using the Sun HPC ClusterTools 4 software to perform administrative tasks without the need of superuser privileges and without the possibility of inadvertently affecting other systems in their network. The publicly available sudo software (see "References" on page 20) meets this requirement by allowing specified superuser-only commands to be executed on specific hosts by properly configured users.
The sudo software package has three different components that are of particular importance:
sudo(8) command
visudo(8) command
sudoers(5) file
Note
The order of installation of these packages does not matter.
The manual pages for the above components provide further details. For the sudo command to allow non-superuser administrators to install and administer the Sun HPC ClusterTools 4 software, the /etc/sudoers file needs to be edited and configured using the visudo editor command (see "References" on page 20 for a link to the Sun GE software project site that contains a sample /etc/sudoers file). It is important to note that vi(1) or other editors cannot modify the contents of this crucial file. It is also important to know the path to the various HPC related commands and their expanded links. For example, the mpadmin(1M) command is a symbolic link that points to a shell script (../isa.sh), taking the name of the symbolic link as an argument and running a similarly named executable in an architecturally specific directory, such as the sparcv9 directory. The following list contains the specific HPC related commands affected by sudoers file:
- mpadmin
- mpinfo
- mpkill
- mpps
- pfsstart
- tm.mpmd
- tm.omd
- tm.rdb
- tm.spmd
- tm.watchd
See "Appendix" on page 20 to see the main part of the /etc/sudoers file that needs to be modified to allow regular users to administer the Sun HPC ClusterTools 4 software on specified HPC hosts. After the sudoers file modifications are made, there is no need to reboot the system or to take any other action for the new changes to take effect. To test the validity of the changes, an eligible user needs only to run the sudo(8) command followed by the desired HPC command on the host(s). It is recommended to use the latest version of the sudo software (version 1.5.9 or a subsequent release) because an earlier version would not accept the wildcard character (*) and prevent the setup as defined by the /etc/sudoers file.
Running Multiple Releases of the Sun HPC ClusterTools Software
A need may arise where a production HPC cluster is using an earlier version of the Sun HPC ClusterTools software and a newer release of the is being transitioned. This configuration requires the use of the NFS install option and configuration of the NFS server to be outside of both clusters. The that serves each cluster needs to be installed in a separate file system so that the HPC related configuration files cannot be confused by the two software releases.
FIGURE 3 shows a two-cluster configuration. One cluster is running the , and the other is running the . The /fs3.1 file system serves the 3.1 cluster, and the /fs4 file system serves the cluster. Note that there can be more than two clusters served by the same NFS server, as long as the rule of one separate file system per cluster is observed.
FIGURE 3 Mixed Cluster Configuration
The now supports the case where a partial cluster is running that is part of a large cluster configured with another distributed resource management software. For example, a customer site could be using Platforms Computing's load sharing facility (LSF) to manage a large cluster of nodes, and a portion of this cluster is running the , together with LSF.