- Analysis and High-Level Observations
- Resolving CPU and I/O Bottlenecks Through Modeling and Capacity Planning
- Conclusions
- Recommendations
- I/O Infrastructure Performance Improvement Methodology
- Data Tables
I/O Infrastructure Performance Improvement Methodology
This section describes a methodology for performance improvement when there is a serious I/O bottleneck. Effective utilization of the I/O infrastructure capacity is the key to environment performance improvement. In the "Resolving CPU and I/O Bottlenecks Through Modeling and Capacity Planning" section, a capacity planning model was used to verify the positive impact on performance of a balanced I/O load distribution. In the simulation, new controllers and disks were added to the model and I/O load distribution was forced across the new devices. The simulations confirmed the benefits associated with a better I/O distribution (though not to be attributed to the addition of the new hardware).
Large database servers are a dynamic, evolving environment. You must be aware that most of the tuning efforts for optimization of I/O performance may became obsolete when the application I/O infrastructure utilization pattern changes. Therefore, performance tuning of such environments should be considered a continuous process rather than a specific one-time, effort and action plan. The suggested action plan is presented as a process for I/O optimization, to be used every time there is a change in the application environment that impacts its I/O utilization pattern.
Infrastructure Optimization Plan
TABLE 1 contains the details of the infrastructure optimization process..
TABLE 1 Infrastructure Optimization Process Details
Action |
Expected Result |
Mitigation Effort |
Effort |
1. Evaluate and implement all possible database server optimization for improved I/O distribution. |
See Oracle and Application Optimization Suggestions |
See Oracle and Application Optimization Suggestions |
See Oracle and Application Optimization Suggestions |
2. Identify the high utilization devices (hot spots) on the I/O infrastructure. Using standard Solaris OE tools or TeamQuest Viewer, identify those devices, (LUNs) with a utilization greater than 80%. |
Medium to high impact. Five to 15% performance improvement can be expected in association with each device for which excessive utilization can be resolved. There should be four-to-six devices currently eligible for optimization in the environment. |
Low risk. Application support and database administration staff is very familiar with operations involving relocation of logical volumes relocation. |
Low for evaluation and medium for implementation. Two-to-four hours for device identification and mapping of database structures. Implementation time depends on ability to execute operation in a test and build environment or during maintenance windows on the production environment. |
3. Identify the logical volumes associated with the high-utilization devices. |
|
|
|
4. Identify the database structures associated with those logical volumes. |
|
|
|
5. Verify that the contents of each high-utilization logical volume can be distributed across less utilized and/or spare logical volumes. Execute I/O distribution and verify new utilization numbers for the affected device1. |
Depends on the number of high utilization devices that can be optimized through relocation of the database files. |
|
|
6. If utilization rates for all devices are under 80%, document new volume configuration to be reproduced in the next database build and re-run2 the TeamQuest capacity planning model, observing the model stretch factor and load growth projections for the new configuration. |
|
|
|
7. If devices utilization is still high, verify if those devices are sharing the same controller with other high utilization devices. |
Low impact. Zero to 10% performance improvement, depending on impact of controller distribution on the I/O devices queue. |
Low risk. Same as the above (execution involves the same identification and logical volume relocation process). |
Low effort for evaluation and medium for implementation. |
8. Distribute the high utilization devices evenly across the available controllers by relocating logical volumes contents, for example. |
Controllers are not perceived today as bottlenecks on I/O performance. Note, however, that this step is recommended because the controllers may became a limiting resource as data starts moving faster, due to improved I/O distribution. |
|
Same as above (execution involves the same identification and logical volume relocation process). |
9. If utilization rates for all devices are under 80%, document new volume contents mapping to controllers, to be reproduced in the next database build. Re-run the TeamQuest capacity planning model, observing model the stretch factor and load growth projections for the new configuration. |
|
|
|
10. If neither logical volume contents can be relocated for better I/O distribution or the controller where the high-utilization device is located is overloaded, consider stripping the logical volume across more than one LUN. |
Medium impact. For each stripped logical volume replacing a current highly utilized volume/device, 5 to 15% performance improvement can be expected. |
Medium risk. Logical volumes and file systems will be removed and the re-created logical volumes striped. Operation can possibly be executed on the building environment and then on a database about to promoted to production to minimize impact on the production environment. |
Medium effort. About 4-to-8 hours per logical volume for research on stripping configuration and deployment. |
11. Evaluate Operations impact on volume stripping. Operations permitting, create "high performance" volumes by stripping a logical volume across two LUNs. Two non-stripped logical volumes can be converted in two stripped logical volumes on the respective two LUNs. Addition of new devices to the environment may facilitate execution and make it viable. |
|
|
|
12. Relocate the identified high utilization devices/database structures to the newly created stripped volumes. Evaluate utilization of new devices. |
|
|
|
13. If utilization rates for all devices are under 80%, document new volume organization to be reproduced on the next database build. Re-run the TeamQuest capacity planning model, observing the model stretch factor and load growth projections for the new configuration. |
|
|
|
14. Depending on the impact of the stripping on performance, consider further logical volume stripping over a higher number of LUNs. |
|
|
|
15. If a given controller remains with more than three high-utilization devices after the operations listed above, consider adding a new HBA to the I/O infrastructure for better I/O distribution. |
Low impact. Zero to 10% performance improvement, depending on impact of controller distribution on the I/O devices queue. |
Low risk. Support staff is very familiar with HBA-related configurations. |
Medium effort. About 4 hours. Hardware upgrade on the production to be executed during a maintenance window. |
16. Verify devices utilization after the new controller is added. If improvement is verified, re-run the TeamQuest capacity planning model, observing the model stretch factor and load growth projections for the new configuration. |
|
|
|
Oracle and Application Optimization Suggestions
Oracle-related optimization is beyond the scope of this analysis. However, TABLE 2 lists some ideas for performance enhancement, based on the I/O distribution improvement principle.
TABLE 2 performance enhancement Ideas
1. Implement the planned Oracle database segmentation changes for improved I/O distribution. |
High performance improvement. A well balanced I/O infrastructure utilization has the potential to improve performance two to three times according to simulations on the capacity planning model. |
Low risk |
Low risk. |
2. Explore the possibility of implementing load balance to the I/O infrastructure at the application level. If there are a few queries that are hot spots from the database standpoint, use a specific index file, for example. Potentially that index file can be duplicated on a read-only database and I/O load balancing implemented at the application/query level. Viability to be determined by the application development and database management teams. |
High performance improvement. Due to the same reasons listed above. Even higher potential for performance improvement, actually now under application direct control. |
Medium risk, due to application code change. |
High, due to application code change. |
3. Review application architecture |
High performance improvement |
Low |
High, due to application re-architecting |