For All SQL Developers
- All SQL join statements should have the columns from each table noted with a Correlation ID when referenced in Select, Where, Group By, or Order By clauses. A Correlation ID should be something other than a letter of the alphabet. Use something descriptive so others can understand from which table each column is coming. This makes the join logic more clear and readable.
- Do not apply any SQL scalar functions against columns coded in the Where clause. This is especially important for columns that make up any index for a table. For example, coding Where Integer(CLM_ID) will automatically eliminate the use of the index for CLM_ID. As another example, the following:
WHERE YEAR(HIREDATE) = 2003
should be coded as:
WHERE HIREDATE BETWEEN '2003-01-01' and '2003-12-31'
to make it an indexable predicate.
- Check your queries with the DB2 Explain tool. A Plan_Table under your ID will need to be created from the DBAs, or use the Plan_Table defined for theDB2 subsystem you are operating under. For example:
Delete from Plan_Table ; Explain Plan Set Queryno = 11 for SELECT EMPNO, LASTNAME, FIRSTNME, WORKDEPT FROM EMP WHERE DEPTNO = ? ; Select * from Plan_Table Order by Queryno, Planno, Qblockno, Mixopseq ;
- Watch out for Order By and Group By statements in queries. Each of these may cause a sort, which requires resource utilization. Code them only if needed. The fewer the columns and rows in a sort, the faster the sort will run, so make sure only the columns needed are coded.
- When coding UNION statements in SQL, start with UNION ALL. By just coding UNION, a sort gets executed to eliminate duplicates, causing more resource utilization. Many times there are not duplicates, so UNION ALL should be the choice that prevents a sort from taking place. Avoid UNIONs if possible. Sometimes the logic can be rewritten using outer joins, case statements, etc.
- Watch out for DISTINCT. This also causes a sort, which requires more runtime. Only code this when absolutely necessary. Many times a rewrite of the statement that can get the same results without the DISTINCT may run more efficiently. (See tuning tip #4 in Chapter 1.)
- Be careful when using the CASE expression as part of the Select statement. This expression can have some considerable overhead during execution. If there are many rows being returned as part of the query, it may help to move that logic as part of your source code after each row is returned. This is especially true if your source is compiled code.
- Do not use Select Count(*) for existence checking. Use this only when you need a total number of rows. It is best to code a Select using the FETCH FIRST 1 ROW ONLY and then check for SQLCODE = 0 or +100.
- Always check the Performance Monitoring and Tuning guide for V9, and the Managing Performance guide for V10 for how to code (or how not to code) predicates to make them indexable and/or stage 1 versus stage 2. (See tuning tip #14 in Chapter 1.) The IBM Data Studio Visual Explain tool will also note any stage 2 predicates.
- Watch out for <> (not equal) predicates. These predicates are non-indexable, but they are stage 1.
- Make sure there is an understanding of inner vs. outer joins. Many times SQL is written with Table1 outer joined to Table2, and then inner joined to Table3. The inner join being coded last can offset the exceptions that took place in the outer join. Many times the three tables could all be coded with inner joins, which would run more efficiently. Outer joins are not inefficient, but if they bring in extra exception rows, and a subsequent inner join then gets rid of those extra rows, it was processing not needed.
Also, make sure that if outer joins are coded, the program is set up to handle nulls being returned from the table where the join is not met. The VALUE, COALESCE, or IFNULL function should be used to keep DB2 from trying to send a null indicator back to the program.
- Try to stay away from NOT logic in general. Try to keep predicates positive as much as possible. For example, the following predicate:
WHERE NOT HIREDATE > :WS-DATE
could be recoded as:
Where HIREDATE <= :WS-DATE
- When coding predicates, keep the logic away from the column to make it an indexable predicate. For example:
WHERE SALARY * 1.10 > 100000.00
is a non-indexable predicate and should be coded as:
WHERE SALARY> 100000.00 / 1.1
- When using date-labeled durations (adding or subtracting years/months/days) to a date, it is logically important in which order they are coded and executed. For example, when adding, the order should be years first, then months, then days:
SELECT CURRENT DATE + 2 YEARS + 3 MONTHS + 1 DAY
When subtracting, the order should be just the opposite: days first, then months, then years:
SELECT CURRENT DATE – 1 DAYS – 3 MONTHS – 2 YEARS
This is important because if they are coded in a different order, the results could be incorrect! Results can be different due to date adjustments on the months. For example, subtracting 1 month from March 31 will result in February 28 or 29.
- If you need to know the last day of a month, use the Last_Day SQL function to get it. For example:
SELECT LAST_DAY(CURRENT DATE) INTO :HV1--Where HV1 is some Host Variable FROM SYSIBM.SYSDUMMY1
- A more efficient way to get the same result as in #15 above is to use the Set statement. For example:
SET :HV1 = LAST_DAY(CURRENT DATE)
- Take advantage of the many date functions in SQL instead of programming code to provide the information needed:
Year/Month/Day returns only that portion of the date value.
DAYOFWEEK/DAYOFWEEK_ISO returns a number (1–7), depending on whether the week begins on Sunday or Monday. DAYOFWEEK_ISO states Monday as the first day of the week.
DAYOFMONTH/DAYOFYEAR returns the specific day number in a month (1–31) or year (1–366).
LAST_DAY returns the last day of the month for a specific date. If the date was 10/15/2005, the date returned would be 10/31/2005.
NEXT_DAY returns a timestamp representing the first weekday greater than the specified date. The function needs to have the weekday specified. For example:
NEXTDAY('01/31/2005', 'MON')
returns the date of the next Monday after the date ‘01/31/2005’.
DAYS is used to get the days difference between two dates. For example:
SELECT DAYS(HIREDATE) - DAYS(BIRTHDATE)
returns the number of days difference.
WEEK returns a number (1–54) that represents the week of the year. Week 1 is the first week that contains the first day of the year.
WEEK_ISO returns a number (1–53) that represents the week of the year. Week 1 is the first week of the year that contains a Thursday, which is equivalent to the first week that contains January 4.
CHAR is used to get a date column back in a specific format (USA, ISO or JIS, EUR).
Subtracting two dates from each other returns a decimal number that has the number of years, months, and days difference between the dates:
SELECT DATE('2010-01-01') - DATE('2007-10-15') FROM SYSIBM.SYSDUMMY1
returns 20217, which means 2 years, 2 months, 17 days. To get just the years difference, use:
SELECT YEAR(DATE('2010-01-01') - DATE('2007-10-15') )
- Not Between is non-indexable. For example, the following predicate:
WHERE SALARY NOT BETWEEN 50000.00 and 100000.00
is a non-indexable predicate and should be coded as follows:
WHERE SALARY < 50000.00 OR SALARY > 100000.00
- Watch out for the Like predicate. If the Like statement is a Begins With predicate, then that predicate is indexable. If the Like statement is a Contains or Ends With predicate, then it is non-indexable. For example:
WHERE LASTNAME LIKE 'A%'- Begins with logic WHERE LASTNAME LIKE '%A%'- Contains logic WHERE LASTNAME LIKE '%A'- End with logic
- Code only the columns needed in the Select. Extra columns can cause the optimizer to choose a different access path that may not be the best choice. Extra columns cause sorts to be more expensive and adds to transmission cost. Even one extra column (at times) can cause the optimizer to choose a different access path. Basically, the wider the result set, the more DB2 has to pull and ship.
- Queries and/or cursors that bring back multiple rows in the result set should have For Fetch Only at the end of the query. This tells DB2 that there is no intention of updating any of the rows being fetched. Because of this, DB2 will try to avoid locking the pages and will possibly block the data rows being returned. For Read Only also does the same.
- Code the most restrictive predicates first. This does not mean that this is the exact order in which DB2 will execute the queries. DB2 will always pick stage 1 indexable predicates first, no matter where they are coded. But within these, it is important to use the correct order.
- Rewrite > Any and > All subqueries. For example, recode this:
SELECT EMPNO, LASTNAME From Emp Where Salary > Any (Select Salary from Emp Where Workdept = 'C11') as follows: Select Empno, lastname From Emp Where Salary > (Select Min(Salary) from Emp Where Workdept = 'C11')