Sun Grid Engine, Enterprise Edition—Configuration Use Cases and Guidelines
This article describes a set of use cases for configuration of Sun_ Grid Engine, Enterprise Edition 5.3 (Sun ONE GEEE) software. It is meant to be a starting point from which intermediate to advanced Sun One GEEE software administrators can create a customized configuration for their particular environment. It is important to realize that each environment has unique requirements.
The greatest benefits of the Sun ONE GEEE software policy module are obtained by fine-tuning a configuration once the results of the initial configuration have been assessed. Moreover, as the environment evolves and the needs of the enterprise change, additional tuning on an ongoing basis will probably be appropriate.
This article assumes the reader has some familiarity with the Sun ONE GEEE features and parameters. For the details of the policies and the major parameters used to set up and influence the various policies, consult the Sun ONE Grid Engine, Enterprise Edition 5.3 Administration and User's Guide, Part IV, Chapter 9. For a complete list of parameters, consult the Sun ONE Grid Engine 5.3 and Sun ONE Grid Engine, Enterprise Edition 5.3 Reference Manual.
Policy Use Cases
This section covers situations in which individual Sun ONE GEEE software policies are implemented to achieve certain goals. The focus is on cases in which different policies are combined.
Job Types With Groups
In this scenario, there are two or more enterprise-wide groups (for example, departments, projects, user groups, and so forth). Each group must receive a specified share of the resources, averaged over time. In addition, there are two or more priority types of jobs that must be dispatched in strict priority order. High-priority jobs are always scheduled ahead of medium-priority jobs, which are scheduled ahead of low-priority jobs regardless of the group from which they originate.
For simplicity, this article describes the solutions with only two groups and two job-types. and assumes that the desired resource allocation ratio between the groups is 75:25, and that the job types can be categorized as either normal or high, indicating relative priority. It should be straightforward to extend the concept to multiple groups and job types.
Two approaches can be used to implement this scenario.
Approach Number 1Projects With Overloading
In this approach, every combination of {group, job-type} is assigned a unique project. For example, with two groups, Group A and Group B, and two types, normal and high, the projects would be:
- grpAhigh
- grpAnrml
- grpBhigh
- grpBnrml
When submitting a job, the user specifies which combination {group, job-type} they want to assign to it by using the -P option to qsub. Hence, the concept of project is overloaded with two characteristics: group and job type. Users would need to be trained to use the appropriate option to the submit command, for example, a user in Group A would run a high-priority job as follows:
qsub -P grpAhigh myjob.sh
To prevent users from accidentally (or deliberately) specifying the wrong group, the access lists for the projects can be set to explicitly include or exclude certain users or user groups. See man page access_list(5).
After creating SGEEE software projects as described previously, you must create a share tree as shown in FIGURE 1. See man page share_tree(5). The penultimate nodes in the tree correspond to the enterprise-wide groups and the leaves are the different projects grouped accordingly. The shares for each node group are set to match with the desired allocation ratio. The shares assigned to the project leaves can all be set the same, because you are not interested in tracking the difference in cumulative utilization between high- and medium-priority jobs. Rather, you want to ensure that the higher priority jobs are always given greater precedence.
FIGURE 1 Share Tree Policy Share Assignments for Approach Number 1
The next step is to add override tickets to the projects according to their priority. In this example, the following assignments would be made:
TABLE 1 Assignment of Override Tickets to Projects
Project Name |
No. of Override Tickets |
grpAhigh |
10,000 |
grpBhigh |
10,000 |
grpAnrml |
0 |
grpBnrml |
0 |
Since there are only two job types, it is sufficient to give a certain number to the highest- priority job, and zero to the lowest. If there were more job types, they would be allocated tickets such that the difference between the levels is always 10,000, for example 0, 10000, 20000. Essentially, a priority band is set for each distinct priority; the number of tickets give the ranking of the bands. The number 10,000 has a significance that is explained in the following paragraphs.
In conjunction with setting the override tickets for each project, the scheduler parameter SHARE_OVERRIDE_TICKETS must be set to FALSE under schedd_params. See man page sge_conf(5). This setting ensures that the tickets do not get divided among the jobs of each project, but rather, each job will get the full 10,000 project override tickets that are necessary to implement the priority bands.
The final step is to assign 9,000 tickets to the share tree policy. The reasoning behind this is as follows. The share tree policy allocates tickets to jobs according to the cumulative utilization of the individual projects, as compared with their share assignments. In the extreme case, in which one project's cumulative utilization is almost zero (and the compensation factor is set to one), a single job submitted into that project could get allocated all 9,000 tickets. Nevertheless, if another job from a high-priority job type is submitted, the 10,000 override tickets will be sufficient to override the 9,000 share policy tickets, and because it has more tickets overall, it would go ahead in the pending list.
More generally, the number of share policy tickets should always be less than the "difference" between the numbers of tickets assigned to the levels of priority for the job types. This is why 10,000 tickets was chosen as the difference between job type levels, while 9,000 was chosen as the total allotment of the share policy.
Note that the actual numbers do not have any significance to the Sun Grid Engine scheduler. The figures 9,000 and 10,000 are simply easy to understand and manage.
Approach Number 2Projects Map Job Types
An alternative way to configure this scenario is to use projects to map only the job types, and put all of the group information into the share tree. The first step in this approach is to create a project for every job priority type. In this example, we would have two projects, with the number of override tickets again configured to give priority bands. As with the previous example, set the scheduler parameter SHARE_OVERRIDE SHARES to FALSE.
TABLE 2 Project and Override Ticket Assignments
Project Name |
No. of Override Tickets |
normal |
0 |
high |
10,000 |
The next step is to create a share tree with the desired groups and share allocation, as shown in FIGURE 2.
FIGURE 2 Share Tree Policy Share Assignments For Approach Number 2
To Add Users
The critical part, and the one that requires most attention, is to explicitly add every user to the appropriate location in the tree. This is a two-step procedure:
Create a Sun ONE Grid Engine user object for every user. See man page user(5).
Assign the Sun ONE Grid Engine user object to the proper place in the tree.
As before, a total of 9,000 tickets are assigned to the share tree policy; in other words, a number smaller than the difference between the number of override tickets for the different priority levels.
When submitting a job, users only need to specify the job type by using the -P option to qsub. Users do not need to be trained to specify a group, for example, a user in Group A would run a high-priority job as follows:
qsub -P high myjob.sh
Since the user was explicitly placed in the share tree under a particular group, the utilization by jobs from that user are automatically accounted correctly.
Comparison of Approaches
The advantage of approach number two is that it is simpler for users, since they only need to specify the job type when submitting jobs without worrying about specifying the proper group. The disadvantage is that it is more work for the administrators to set up, because they must explicitly add every user to the share tree. For an environment with a large number of users, this is best achieved via scripting, and integrating with some external user list. For example, there might be a Lightweight Directory Access Protocol (LDAP) directory that contains users organized into departments. You could write a script that reads in user's information from this directory, creates the Sun ONE GEEE user object, and then inserts the user object into the tree depending upon the department code. An example listing of such a code is given in CODE EXAMPLE 1. Such a procedure would need to be done any time a user is added to the environment.
Projects Span Groups
In this scenario, there are two or more enterprise-wide projects, that is, sets of jobs that are closely related, and two or more groups of people with different privileges, working on both projects together. These groups could be, for example, from different departments, or there could be regular and power users, the latter having greater privileges. The desire is to allocate resources to the projects based on cumulative utilization, while simultaneously guaranteeing a certain priority or service level for the different groups. For example, power users' jobs go before other users or else they receive a greater proportion of available resources.
Configuration
This example assumes two projects, Project 1 and Project 2, and two groups, Department A and Department B. See man page access_list(5). The goal is to give 20 percent of resources to Project 1 and 80 percent to Project 2. In addition, people in Department A should get 60 percent of the resources, regardless of which project they submit a job under, while Department B should get 40 percent. (Later, we will modify this for the case where Department A's jobs should always go ahead of Department B's jobs).
To Set Up the Share Tree Policy
The procedure is to create the two projects and then set up the share tree with the desired resource allocation ratio among the projects.
FIGURE 3 Share Tree Policy Share Assignments for Projects That Span Groups
Set up two Sun ONE Grid Engine departments (TABLE 3) in the Userset/Userlist configuration.
TABLE 3 Department Setup and Share Assignment
Department Name
No. of Functional Shares
Dept A
60
Dept B
40
In the functional policy configuration, assign shares to the two departments accordingly.
Set the number of tickets for the share tree policy to 1,000,000 and the number of tickets for the functional policy to 1,000
Set the following parameters in the cluster configuration under schedd_params:
POLICY_HIERARCHY=FS,SHARE_FUNCTIONAL_SHARES=TRUE
To submit jobs the users would simply specify the project under which the job's utilization should be accounted:
qsub -P Project1 myjob.sh
You can restrict access to projects using project access lists.
This example shows how two policies can be combined to achieve a desired goal, and also illustrates one use of the POLICY_HIERARCHY parameter. With this setup, jobs are balanced between projects according to the specified resource allocation ratio but, within a project, jobs are dispatched according to functional (that is, department) ordering. Because the parameter SHARE_FUNCTIONAL_SHARES is set to TRUE, it prevents one department from excluding the other. Instead, jobs are dispatched in an interleaved fashion among all of the departments.
Most crucially, the share policy is guaranteed to have highest precedence by two factors:
The number of share policy tickets greatly exceeds the number of functional policy tickets
The POLICY_HIERARCHY is set to FS.
Having more share policy tickets than functional tickets means that the tickets allocated by the share policy will have the greatest impact on the overall number of tickets assigned to each job, which determines the final dispatch order. The number of tickets from the functional policy assigned to each job will be so small that, in most cases, it will be negligible in determining the total number of tickets assigned to each job.
The functional tickets only have an overriding influence in the extreme case where there is a very large number of pending jobs, or when the utilization of a project greatly exceeds the target. In this case, the number of share policy tickets assigned to a particular job might be very low, lower perhaps than the number of functional policy tickets assigned to the same job. Additionally, when a large mismatch between actual and target utilization for a project exits, the compensation factor can be used to limit the degree to which a project's ticket allocation is diminished to distribute the share policy tickets more evenly between projects.
Nevertheless, we want to ensure that, even though the number of functional tickets is small compared with the share policy, the functional policy should still have some impact. This is where the POLICY_HIERARCHY setting comes in. Setting FS instructs the scheduler to consider the functional tickets first, for sorting within a share tree node. Thus, instead of first-in first-out (FIFO) ordering, jobs within a share tree node are ordered according to the functional policy settings.