- Policy Use Cases
- Configuration Guidelines
- Extended Use Case
Configuration Guidelines
The scenarios in the previous section gave examples of how the various parameters in the Sun ONE GEE are used to effectuate a desired resource allocation scheme. This section summarizes a few principles to apply when customizing the policies for an arbitrary scenario.
Policy-Setting
This section contains guidelines for:
- Number of Tickets
- Ticket Sharing Settings
- Share Tree Policy
Number of Tickets
The relative number of tickets determines which policy is overarching versus fine tuning. To make one policy clearly dominate over the other, ensure that the difference in number of tickets between the two polices is large. If not, then the two policies could contribute roughly the same number of tickets to jobs, and the final outcome would be hard to predict. This is particularly relevant when using the POLICY_HIERARCHY to specify the precedence of policies. The policy that comes earlier in the hierarchy should have fewer tickets than the policy that comes later.
Ticket Sharing Settings
Three SHARE_*_* parameters in schedd_params influence the overall behavior of the policies:
SHARE_FUNCTIONAL_SHARES determines if you want strict or interleaved ordering. If set to FALSE, the net effect is that whatever share values are set in the functional policy will be interpreted as strictly an ordering. A setting of TRUE causes the share values to be determined as an allocation ratio, with jobs dispatched in an interleaved fashion to result in the specified ratio.
SHARE_OVERRIDE_TICKETS and SHARE_DEADLINE_TICKETS prevent these respective policies from "taking over" the scheduling system. If these are set to FALSE, the number of override or deadline tickets in the system increases with the number of jobs submitted. If these jobs are important, this is the desired behavior. However, if these parameters are set to TRUE, the number of tickets is fixed, and submitting more jobs dilutes the number of tickets per job assigned by these two policies. This can, for example, help prevent "abuse" by users who are granted deadline or override privileges. The choice depends on the amount of authority the users should have.
Share Tree Policy
If the share-tree policy is to be used and you wish to allocate and track usage on a per-user basis, every user must have a user object in the Sun ONE GEEE software. The reason is that the Sun ONE GEEE software must create a data structure in which to track and store the resource allocation usage for each user. The exception to this is if you set up the default user under each project. This action lets you set the resource allocation for generic users; you then only need to add users whose resource allocation differs from the default.
If, however, you want to allocate and track usage on a per-project basis only, there is no need to add every user as a Sun ONE GEEE user object. Users simply submit jobs using the -P project flag. Privileged projects can be restricted using project access lists.
To simplify the management of users within Sun ONE GEEE software, the configuration command qconf has a rich set of options that allow every operation in Sun ONE GEEE software to be scripted. The following script (CODE EXAMPLE 1) is an example of how you might use scripting to populate a share tree based upon an LDAP directory. For complete list of possibilities, consult the man page and -help option of qconf.
CODE EXAMPLE 1 Sample Share Tree Updating Automation Script
#!/bin/ksh # example script to add users to an already-existing # SGEEE share tree based on the enterprise's # LDAP directory entries # nf: a command which displays information from # an LDAP directory # NOTE: use an equivalent command for your site usage () { echo "Usage: $0 <dept_code> <sharetree_nodename>" } add_sgeee_user() { TMP=/tmp/sgeee.$$ sgeuser=$1 echo "name $sgeuser" > $TMP echo "oticket 0" >> $TMP echo "fshare 0" >> $TMP echo "default_project NONE" >> $TMP qconf -Auser $TMP rm $TMP } if [ $# -ne 2 ] ; then usage; exit 1; fi DEPT=$1 NODE=$2 # below is a command which extract usernames from # the LDAP directory based upon department codes # NOTE: strip the line which simply tells # the number of entries found USERS=´nf -D $DEPT -c u | grep -v "entries found"´ for user in $USERS; do add_sgeee_user $user qconf -astnode /$NODE/$user=50 done
Prototyping a Scenario
Configuring Sun ONE GEEE software is an iterative process. By this we mean that you should not try to achieve a given final result immediately (unless it is relatively simple). Instead, the approach should be to implement a trial configuration and, after testing it, refine it further and repeat this procedure as needed. The best way to start this iterative process is to create a prototype of your actual environment. This process would involve measures such as:
Dedicating a small number of systems for the prototype (three or four systems are sufficient)
Creating dummy jobs that emulate how the actual production jobs would behave (unless you can actually use your production applications for the prototyping)
Creating dummy users, projects, departments, and so forth
After configuring a Sun ONE GEEE setup candidate, a quick way to see if it is behaving as expected is to suspend or disable all queues, and then submit jobs according to the expected usage pattern. Using the qstat -ext command, you can inspect the number of tickets assigned to each job and the contribution to the total that is coming from the individual policies. Since the overall total number of tickets determines the final job dispatch order, you can see, for example, if a certain policy is contributing too many or too few tickets to this total, and readjust the policy parameters accordingly.
Other Configuration Policies
You should keep in mind the fact that the Sun ONE GEEE software has other capabilities beyond the policy module which can help to create the configuration that suits a given scenario.
User lists and departments can be used to control access rights to queues, hosts, and projects; for example, permit only certain jobs or users to utilize certain systems.
Preemption using subordinate queues can provide the ability to run jobs that have immediate priority; for example, very important jobs or interactive jobs.
Calendars for suspend or disable can be used to disable some queues selectively, while leaving others enabled; for example, low-priority projects can run at night, while higher priority projects can run any time.
Queue sort method (load formula or sequence number) can be used to sort among eligible queues to determine the order in which resources (queues) are selected for jobs.
Administrator-defined complexes and resources provide the ability to manage jobs based upon practically any characteristic or metric. Load sensors complete the picture by providing a way to input the current value of a metric into the system.
These features are present in the basic Sun ONE Grid Engine software, but taken together with the policies of Sun ONE Grid Engine, Enterprise Edition, the possibilities for adapting the software to suit a given environment's needs are indeed great.