- Global Grid Example One
- Overview of a Global Compute Grid
- Brief Description or the Globus Toolkit 2.0
- Distributed Resource Managers
- Portal Software and Authentication
- Global Grid Example One
- Global Grid Example Two
- About the Authors
- About the Authors
- Ordering Sun Documents
- Accessing Sun Documentation Online
Global Grid Example One
Introduction
In this section, the White Rose Grid (WRG) is discussed. The White Rose Grid, based in Yorkshire, UK, is a virtual organization comprising three universities: the Universities of Leeds, York, and Sheffield. There are four significant compute resources (cluster grids) each named after a white rose. Two cluster grids are sited at Leeds (Maxima and Snowdon) and one each at York (Pascali) and Sheffield (Titania).
The White Rose Grid is heterogeneous in terms of underlying hardware and operating platform. While Maxima, Pascali, and Titania are built from a combination of large symmetric memory Sun servers and storage/backup, Snowdon comprises a Linux/Intel based compute cluster interconnected with Myricom Myrinet.
The software architecture can be viewed as four independent Cluster Grids interconnected through global grid middleware, and accessible, optionally, through a portal interface. All the grid middleware implemented at White Rose is available in open source form.
FIGURE 2 shows the overall architecture. Each of the four WRG cluster grids have an installation of Sun™ ONE Grid Engine, Enterprise Edition. Globus Toolkit 2.0 provides the means to securely access each of the cluster grids through the portal.
FIGURE 2 White Rose Grid Overall Architecture
Grid Engine Enterprise Edition
Sun One Grid Engine, Enterprise Edition is installed at each of the four nodes, Maxima, Snowdon, Titania, and Pascali. The command line and GUI of Enterprise Edition is the main access point to each node for local users. The Enterprise Edition version of grid engine provides policy driven resource management at the node level. There are four policy types that might be implemented:
Share tree policy Enterprise Edition keeps track of how much usage users/projects have already received. At each scheduling interval, the scheduler adjusts all jobs' share of resources to ensure that users/groups and projects get very close to their allocated share of the system over the accumulation period.
Functional policy Functional scheduling, sometimes called priority scheduling, is a nonfeedback scheme (that is, no account taken of past usage) for determining a job's importance by its association with the submitting user/project/department.
Deadline policy Deadline scheduling ensures that a job is completed by a certain time by starting it early enough and giving it enough resources to finish on time.
Override policy Override scheduling allows the Enterprise Edition operator to dynamically adjust the relative importance of an individual job or of all the jobs associated with a user/department/project.
At White Rose, the Share Tree policy is used to manage the resource share allocation at each node. Users across the three universities are of two types: local users who have access only to the local facility and White Rose Grid users who are allowed access to any node in the WRG. Each White Rose Grid node administrator has allocated 25 percent of their node's compute resource for White Rose Grid users. The remaining 75 percent share can be allocated as required across the local academic groups and departments. The White Rose Grid administrators also agree upon the half-life associated with Sun One Grid Engine, Enterprise Edition so that past usage of the resources is taken into account consistently across the White Rose Grid.
Globus
As shown in FIGURE 2, each White Rose Grid Cluster Grid hosts a Globus gatekeeper. The default job manager for each of these gatekeepers is set to grid engine using the existing scripts in the GT2.0 distribution. In order that the Globus job manager is able to submit jobs to the local DRM, it is necessary to ensure that the Globus gatekeeper server is a registered as a submit host at the local grid engine master node.
The Globus grid security file referenced by the gatekeeper servers includes the names of all WRG users. New users' grid identities must be distributed across the grid in order for them to be successfully authenticated. Additionally, at each site all WRG users are added to the user set associated with the WRG share of the Enterprise Edition controlled resource. This ensures that the sum usage by WRG users at any cluster grid does not exceed 25 percent.
Portal Interface
The portal technology used at White Rose has been implemented using the Grid Portal Development Kit (GPDK). It has been designed as a web interface to Globus. GPDK uses Java Server Pages (JSP) and Java Beans and runs in Apache Tomcat, the open source web application server developed by Sun Microsystems. GPDK takes full advantage of the Java implementation of the Globus CoG toolkit.
GPDK Java Beans is responsible for the functionality of the portal and can be grouped into five categories; Security, User Profiles, Job Submission, File Transfer, and Information Services. For security, GPDK integrates with MyProxy. MyProxy enables the Portal server to interact with the MyProxy server to obtain delegated credentials in order to authenticate on the user's behalf.
The following development work has been done in order to port the publicly available GPDK to GT2.0:
GPDK was modified to work with the updated MDS in GT2.0
Information providers were written to enable grid engine queue information to be passed to the MDS. Grid users can query MDS to establish the state of the DRMs at each cluster grid.
As with many current Portal projects, the White Rose Grid uses the MyProxy Toolkit as the basis for security. FIGURE 3 shows that prior to interacting with the WRG, a user must securely pass a delegated credential to the portal server so that the portal can act upon that user's behalf subsequently. The MyProxy Toolkit enables this.
FIGURE 3 shows the event sequence up to job submission.
Events 1-4) When users initially log on, the MyProxy Toolkit is invoked so that the portal server can securely access a proxy credential for that user.
Events 5 and 6)The users can view the available resources and their dynamic properties via the portal. The Globus MDS pillar provides the GIIS, LDAP based hierarchical database which must be queried by the portal server.
Event 7) Once users have determined the preferred resource, the job can be submitted. The job information is passed down to the selected cluster grid where the local Globus gatekeeper authenticates the users and passes the job information to Sun ONE Grid Engine, Enterprise Edition.
Event 8) When the job begins, it might further use mechanisms implemented by the Globus Toolkit during the course of execution, such as retrieving up-to-date resource information, requesting and retrieving data sets from other campuses, and storing results in other locations as well. When the job finishes, users can view the results through the Grid Portal at their desktop.
FIGURE 3 Event Sequence