- Perform, #%&#@!
- Defining the Basic Problem
- Understanding the Optimizer and Associated Tools
- Managing the WHERE Clause
- Creating Covering Indexes
- Joining Columns
- Sorting with DISTINCT and UNION
- Choosing Between HAVING and WHERE
- Looking at Views
- Forcing Indexes
- Summary
Understanding the Optimizer and Associated Tools
To see how query variants affect performance, you need two things: information about indexes and a way to see the choices the optimizer made. Most systems provide tools for both these functions, but they vary widely.
Getting Information on Indexes
You'll always be able to find out what indexes are connected to a particular table, through the meta-data system catalogs if nothing else (see "Getting Meta-Data from System Catalogs" in Chapter 7). For example, in Adaptive Server Anywhere, a quick survey of the system catalogs reveals two likely suspects: sysindex and systable. A first try at a query to locate indexes for a particular table might look like this:
Adaptive Server Anywhere select sysindex.index_name, systable.table_name from sysindex, systable where sysindex.table_id = systable.table_id and table_name = 'product' index_name table_name ===================== ================================= prodix product pricex product [2 rows]
(See "Writing Queries Using System Catalogs" in Chapter 7 for more examples of generating this kind of code.)
In Adaptive Server Enterprise and MS SQL Server, an easy way to investigate indexes is the sp_helpindex stored procedure (the output from the two RDBMSs is not identical). It tells you the names of the indexes and the columns in each. ("Nonclustered" is a type of Transact-SQL index, with pointers from the bottom row of the index to the datathe index and data are not in the same order. This contrasts to a clustered index, in which the index and data are in the same order.)
Adaptive Server Enterprise exec sp_helpindex product index_name index_description index_keys ix_max_rows_per_page ---------- ------------------------------------------ ----------- -------------------- prodix nonclustered, unique located on default prodnum 0 pricex nonclustered located on default price 0 (2 rows affected)
In many systems, including the Adaptive Server Anywhere on this book's CD, the index information source of choice is a graphical user interface (GUI) tool. Since you may not be much of an ASA user but would like to follow along, here are basic instructions on using ASA Sybase Central for index information (see the help files for details). If you're using another system, the tools and commands will look different but will probably supply very similar informationthe name of the index, the columns that make it up, the order in which the columns appear, and the index type.
Open Sybase Central and click Connect under the Tools button (Figure 62). Choose Adaptive Server Anywhere. Fill in the forms for the Login and Database tabs for msdpn (Figure 63), logging in as DBA with a password of SQL (both must be uppercase) and substituting your msdpn6.db address for C:/SQLBook on the Database file line.
Figure 62. Connecting on Sybase Central
Figure 63. Logging In
After you click OK, you'll see your server (here msdpn, with the terminal icon) listed in the Sybase Central window (Figure 64). Click it and then click the database-owner combination you want to usemsdp(DBA)with the disk icon. You'll see a list of objects (Figure 65). Click Tables. From the detailed list of tables and views, click the table name you want (say, product) and then Indexes (Figure 66). Pick the index you want. Use the tabs on the window to get details.
Figure 64. Choosing a Database
Figure 65. Listing Objects
Figure 66. Getting Index Details
There is information on the msdpn indexes in "Table Details" in Chapter 1.
Checking the Optimizer
To check your work, you need to know a little about your optimizer, the part of the SQL engine that decides how to process a query. A cost-based optimizer looks at processing options and chooses the one that is "cheapest" in terms of time. If your system is cost-based, you may need to run commands that make sure the statistics on data are current (see Table 62). In a rule-based system, the optimizer makes choices based on a set of ranked guidelines.
Most systems support a command or a tool that shows you what the optimizer is doing. In Adaptive Server Anywhere, the software included on the disk, there is one of each.
The command is the PLAN function, which takes a query (in quotes) as its argument. For a list of related commands, see Table 61.
The tool is the Performance Monitor on Sybase Central (an option under Statistics).
If you're following along with the Adaptive Server Anywhere software included on the CD, you may need to send PLAN results to an output file in order to read more than the first line. To do this, end the query with a semicolon and follow it with an OUTPUT TO line naming a file in which to store the query results, and a FORMAT line prescribing the format of columns in the output file. The following examples illustrate this method.
Adaptive Server Anywhere select plan ( 'select prodnum, type, price, description from product where prodnum in (1104, 1105, 1106, 1107)' ); output to out.txt format text
The out.txt file (it could have any name) is located in the directory that holds the ASA database file. When you open the out.txt file, you'll see information on how the query was processed.
Adaptive Server Anywhere Estimate 1 I/O operations (best of 2 plans considered) Scan product sequentially Estimate getting here 21 times For _value_1 in (1104,1105,1106,1107)
If you change the IN phrase to a BETWEEN, the output is different.
Adaptive Server Anywhere select plan ( 'select prodnum, type, price, description from product where prodnum between 1104 and 1107' ); output to out.txt format text Estimate 5 I/O operations Scan product using unique index prodix for rows where prodnum is between 1104 and 1107 Estimate getting here 4 times
Without knowing much about the PLAN messages, you can see that the IN query doesn't use an indexit does a table scan. The BETWEEN query uses the prodix index. The first one goes somewhere 21 times, while the second makes just four trips.
You can get a shorter version of that information by looking at the ASA Interactive SQL Statistics window (Figures 67 and 68). Notice that the PLAN> line for the two queries is different. The first indicates a sequential scan of the table. The second shows index use. This is parallel to the PLAN results, and easier to read and generate. Figure 68 shows just the statistics part of the screen produced by the BETWEEN version of the query. Adaptive Server Anywhere provides more detailed performance information in the Sybase Central Performance Monitor.
Figure 67. Interactive SQL Window
Figure 68. Statistics Pane of the Interactive SQL Window
For a summary of commands that keep tabs on the optimizer, see Table 61. You'll need to do some research before you can get much information from the output of any of these commands. You may find additional tools that are used at your siteGUI-based, third party, or home grown.
Table 6-1 Monitoring Performance
ANSI |
ASA |
ASE |
MS SQL Server |
Oracle |
Informix |
|
PLAN ( 'query' ) |
SET SHOWPLAN ON SET NOEXEC ON |
SET SHOWPLAN _ALL ON |
EXPLAIN PLAN |
SET EXPLAIN ON |
SQL Conventions
Before diving into performance in SQL queries, consider how your code looks. Code is easier to read and understand if you present it consistently. In some systems, reusability of cached code may depend on the various copies being identical. Differences as small as a single space character may be relevant. In addition, training time for new employees is shorter if they can expect consistent patterns. For your sanity, develop coding guidelines. Here are some common suggestions.
Start each line with a SQL verb (SELECT, FROM).
select prodnum, type, price from product where prodnum between 104 and 107
Indent continued lines.
select prodnum, type, price from product where prodnum between 104 and 107 and price >50.00
Be consistent in naming tables and columnsdon't make some table names singular and others plural, don't use case randomly, and don't call one column pubdate and a related column in another table pub_date.
If you use table aliases, stick to the same ones, and don't use nonmnemonic aliases such as a, b, and c for supplier, product, customer.
Put in lots of comments: the date, your name, what the query or script is abouteverything you'd want to know.