Practical SQL: Tuning Queries

May 31, 2002

␡

Perform, #%&#@!
Defining the Basic Problem
Understanding the Optimizer and Associated Tools
Managing the WHERE Clause
Creating Covering Indexes
Joining Columns
Sorting with DISTINCT and UNION
Choosing Between HAVING and WHERE
Looking at Views
Forcing Indexes
Summary

⎙ Print

< Back Page 4 of 11 Next >

Managing the WHERE Clause

The WHERE clause is the place to start your search for SQL you can change to get better performance. You already know what indexes you have. Now look at the optimizer plan for a problem query and see if the indexes are getting used or if the optimizer is searching the table row by row.

Why a Table Scan?

Just because you have an index on a column doesn't mean your optimizer will use it.

If the amount of data is trivial (as in the msdpn tables), a search using an index may not be faster than a table scan and the optimizer may choose not to use the index.
If the query includes all rows in the table (no WHERE clause), the optimizer does a table scan.
If you're retrieving a lot of data from the table, an index may give no advantage.
If the optimizer does not have accurate information on data distribution, it may pick the wrong index or no index at all.
If you include certain elements in the WHERE clause, you may make the index unavailable.
Of these elements, only the last two are under your control.

Data Distribution Statistics

First, check your system documents to see if the DBA needs periodically to run a command to keep the optimizer current (UPDATE STATISTICS in Transact-SQL and Informix, ANALYZE in Oracle). Microsoft SQL Server also supports a command that tells you when the command was last run (STAT_DATE—there's more information on it in "Using System Functions" in Chapter 7). See Table 6–2 for a list of commands associated with index statistics. The SQL Anywhere ESTIMATE command is included in the table, but it works differently than the UPDATE STATISTICS or ANALYZE commands, allowing the user to give the optimizer hints on data distribution.

Table 6–2. Index Statistics

ANSI	SQL Anywhere	SAE	MS SQL Server	Oracle	Informix
	ESTIMATE	UPDATE STATISTICS	UPDATE STATISTICS STATS_DATE	ANALYZE	UPDATE STATISTICS

ANSI

SQL Anywhere

SAE

MS SQL Server

Oracle

Informix

ESTIMATE

UPDATE STATISTICS

STATS_DATE

ANALYZE

UPDATE STATISTICS

Disabling an Index with a Bad WHERE

Second, don't disable a valid index by the way you construct your WHERE clause. The easiest elements to optimize are comparison operators (=, >, <, and variants) or operators that can be translated to comparison operators (BETWEEN and some LIKE clauses).

Anything else may make your indexes unavailable. Here are some suspicious areas to investigate. Since architecture and optimizers vary so much, you'll have to check your system documentation to find out just how these areas affect (or don't affect!) your queries.

Comparing columns in the same table
Choosing columns with low-selectivity indexes
Doing math on a column before comparing it to a constant
Applying a function to column data before comparing it to a constant
Finding ranges with BETWEEN
Matching with LIKE
Comparing to NULL
Negating with NOT
Converting values
Using OR
Finding sets of values with IN
Using multicolumn indexes

Comparing Columns in the Same Table

In many systems, comparing columns in the same table makes an index useless. For example, consider matching the empnum column to a string constant versus matching it to another column, bossnum. The two queries return the same results.

Adaptive Server Anywhere
select fname, lname, empnum, bossnum
from employee
where empnum = bossnum

select fname, lname, empnum, bossnum
from employee
where empnum = '443232366'
 
fname                lname                empnum    bossnum 
==================== ==================== ========= =========
Scorley              Blake-Pipps          443232366 443232366

[1 row]

With a nontrivial number of rows, the first query scans the table sequentially. The second uses the empix index. If you test this query on another system, you may not see this difference, because the employee table is so small. However, it's clear that an index on empid helps find the employee with a particular identification number and is less useful in finding one with the same number as the boss. Why? Because a constant (443232366) is constant. An index points to a known value, not an unknown value.

Using Nonselective Indexes

A unique index is 100% selective. Every index entry points to a single location in the data. A nonselective index is just the opposite—each index entry points to multiple data locations. You can think of selectivity as roughly the number of distinct index entries divided by the number of data rows. Optimizers don't get much advantage from nonselective indexes, and they often don't use them.

Returning to the book index comparison, imagine an index that listed every occurrence of the word "and." Because "and" is so common, you'd find an index reference to just about every page in the book. Using the index does not make finding "and" faster than paging through the book, because "and" has low selectivity.

Consider attributes such as gender, where there are only two choices. There's no point indexing this kind of column unless the data distribution is very skewed. If 90% of the employees are male and 10% are female, the optimizer might use the index where gender is female. It would not use it in searches for males.

Doing Math on a Column

Another WHERE clause element to watch out for is doing math on an indexed column before comparing it to a constant. The index can find the column value but not the column value * 2. To test this out on the msdpn database on the Adaptive Server Anywhere CD, run these two queries.

Adaptive Server Anywhere
select prodnum, price 
from product
where price * 2 > 200

select prodnum, price
from product
where price > 100

  prodnum   price
=========== ==========
       2111    119.99
       1106    149.00
       1794    400.00

[3 rows]

The first one does a table scan. The second takes advantage of the pricix index. This change is relatively easy to enforce, thanks to our good friend Ms. Algebra. Unadorned indexed columns on the left! Computations on the right! Get in the habit of changing WHERE clauses as in Figure 6–9.

Figure 6–9. Calculation in the WHERE Clause

Using Functions

Functions have the same effect on an indexed column in a WHERE clause as math does. The index points to a value but understands nothing about calculations or functions. You already know that a search for employees by number uses the empix index.

Adaptive Server Anywhere
select fname, lname, empnum, bossnum
from employee
where empnum = '443232366'

If you modify the column name (empnum) with a substring function, even one that does exactly the same match and produces the same results, you'll get a table scan.

Adaptive Server Anywhere
select fname, lname, empnum, bossnum
from employee
where substr(empnum, 1, 9) = '443232366'

fname                lname                empnum    bossnum 
==================== ==================== ========= =========
Scorley              Blake-Pipps          443232366 443232366

[1 row]

Here's another example. The first query does a table scan. The second uses the empid index.

Adaptive Server Anywhere
select fname, lname, empnum, bossnum
from employee
where empnum || ' ' || fname = '443232366 Scorley'

select fname, lname, empnum, bossnum
from employee
where empnum = '443232366' and fname = 'Scorley'

fname                lname                empnum    bossnum 
==================== ==================== ========= =========
Scorley              Blake-Pipps          443232366 443232366

[1 row]

Finding Ranges with BETWEEN

If there is an index on a column, BETWEEN will not disable it. BETWEEN is treated as a pair of comparison operators. The low value must precede the high value.

The first of the following queries uses the prodix index on prodnum. The second uses the index on price.

Adaptive Server Anywhere
select prodnum, name
from product
where prodnum between '1110' and '1357'

  prodnum     name        
=========== ====================
    1110      star systems
    1255      bug stories
    1357      nt guru

[3 rows]

select prodnum, name, price
from product
where price between 300 and 400

  prodnum   name                 price
=========== ==================== ==========
    1794    memory8              400.00

[1 row]

NOT BETWEEN is not as easy to resolve with an index—an index points to a specific value. In the following case, you get a table scan.

Adaptive Server Anywhere
select prodnum, name, price
from product
where price not between 10 and 100

  prodnum     name                price
=========== ==================== ==========
    2111      memory tripler      119.99
    1794      memory8             400.00
    1106      z_connector         149.00

[3 rows]

Matching with LIKE

Most systems can take advantage of an index with LIKE as long as you provide the first character in the pattern (see "Matching Patterns" in Chapter 2). To follow this example, start by creating an index on the name columns of customer, last name first.

Adaptive Server Anywhere
create unique index custnmix 
on customer (lname, fname)

select lname, fname
from customer
where lname like 'Pe%'

The SQL engine translates the LIKE into a range comparison and is able to take advantage of the index.

Adaptive Server Anywhere
select lname, fname
from customer
where lname >= 'Pe' and lname < 'Pf'

lname                fname        
==================== ====================
Peters               Pete

[1 row]

However, the picture is different if you start with a wildcard. The optimizer does not use an index—the index cannot point to an unknown value.

Adaptive Server Anywhere
select lname, fname
from customer
where lname like '%ete%'

You get the same answer, of course, but the method is a table scan.

Comparing to NULL

Adaptive Server Anywhere uses the index on product.price in both the following queries, but not all systems do. Check vendor documentation to find out how nulls and indexes relate. If your system takes the "a null is not equal to anything" dictum into the realm of indexes, you may want to campaign for defaults instead of nulls at design time.

Adaptive Server Anywhere
select prodnum, price
from product
where price = 49.99

select prodnum, price
from product
where price is null

Another issue to check out is whether or not null returns a value in an IN. The following query may return one or two rows, depending on how your system handles null values in this situation.

Adaptive Server Anywhere
select prodnum, price
from product
where price in (null, 400.00)

Negating with NOT

A related area is how NOT and negatives such as <> and != affect index use. Indexes, after all, point to entries. What do they know about nonentries? The two following queries return the same results here (they could return quite disparate results with different data). The first uses the custnmix on last and first names. The second does not.

Adaptive Server Anywhere
select lname, fname
from customer
where lname = 'WONG'

select lname, fname
from customer
where lname not between ' ' and 'W'

lname                  fname        
==================== ====================
WONG                   LI-REN
[1 row]

Get in the habit of converting NOT phrases when you can (Figure 6–10). In most cases, you'll be better off looking for rows with prices greater than zero rather than rows with prices not equal to zero. In addition to its effect on index uses, negative logic is sometimes hard to understand and therefore open to error.

Figure 6–10. Negatives in WHERE

Converting Values

Conversions, like math or functions, can disable an index, and this is true for autoconversions the system handles as well as conversions you perform. For starters, keep the conversion on the right side of the equation, if possible (Figure 6–11).

Figure 6–11. Conversions in WHERE

In ASE, character data defined to allow null is actually stored as VARCHAR data, so joining two columns that differ only in whether or not they allow nulls may cause a conversion. The optimizer cannot use an index on the converted column. In Oracle, some autoconversions deactivate an index, others don't. Check your vendor documentation for details.

You'll find information about convert functions in "Converting Dates (and Other Datatypes)" in Chapter 2.

Using OR

An OR clause returns results if any one of the conditions is true (A = 1 or B > A or C = 3). If the columns in an OR clause have indexes, the optimizer can use the relevant indexes or do a table scan. Using the indexes means dumping results from each clause into an intermediate table and then removing duplicates from the intermediate table (Figure 6–12).

Figure 6–12. OR Processing: Indexes

In a cost-based system, it's important that the table statistics be up to date. Otherwise, the optimizer may make the wrong choice.

If the ORed columns do not have indexes or the optimizer chooses not to use the indexes, you'll see a table scan (Figure 6–13).

Figure 6–13. OR Processing: No Indexes

Finding Sets of Values with IN

Many third-party front-end applications overuse IN, sometimes employing it to find a single value or to return values in a range (Figure 6–14). This can be a problem, as IN is a form of OR processing, and tends to be expensive. If you are lucky, an IN with multiple terms can be re-stated as a range. This works only when the elements in the IN are the only ones in the range. If there were a $200 price in the second Figure 6–14 query, the BETWEEN translation would not be equivalent to the IN version.

Figure 6–14. IN

Check IN clauses carefully. Could you do the same work with a simple equals comparison? With a range?

Using Multicolumn Indexes

In many cases, indexes consist of two or more columns (Figure 6–15). The ordprodix index in orderdetail is an example: it includes the ordnum and prodnum columns, in that order, and the order matters. The optimizer can use the index to find ordnum or combinations of ordnum and prodnum, but it can't follow pointers to go directly to prodnum values. Think of a phone book entry. It helps you find all the subscribers surnamed Smith and all the Smiths with Heather as a first name. It's no good for finding people with an unknown last name whose first name is Heather, though.

Figure 6–15. Multicolumn Indexes

When you construct WHERE clauses, keep in mind the order of the columns in the index. Order of columns in the SELECT list has no effect.

< Back Page 4 of 11 Next >

🔖 Save To Your Account

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Privacy Notice

Overview

Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information

To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information

Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security

Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children

This site is not directed to children under the age of 13.

Marketing

Pearson may send or direct marketing communications to users, provided that

Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
Such marketing is consistent with applicable law and Pearson's legal obligations.
Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information

If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out

Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information

Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents

California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure

Pearson may disclose personal information, as follows:

As required by law.
With the consent of the individual (or their parent, if the individual is a minor)
In response to a subpoena, court order or legal process, to the extent permitted or required by law
To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
To investigate or address actual or suspected fraud or other illegal activities
To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links

This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact

Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice

We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020

Email Address