Including Non-Key Columns in Non-Clustered Indexes
This is a great feature that is somehow not advertised well by Microsoft (surprise! surprise!!). In a non-clustered index, you can include other columns in addition to the index keys. Okaythat is a vague statement, so let's see what it means in practice.
NOTE
You will need the pubs database to try out these queries.
In the SQL Server Management Studio, open a new query window. Place the cursor in the window; using the Query menu, select the Include Actual Execution Plan option, as shown in the following figure.
Now run this query:
USE PUBS SELECT au_lname, au_fname FROM authors WHERE au_lname = 'White'
Select the execution plan tab and note that the engine performs an Index Seek operation (see the following figure) because the authors table has an index on the au_lname, au_fname columns called aunmind. So it can query and return all the data using the index without ever having to go to the table. Such an index is called a covered index (all the data that needs to be returned is covered in the index columns).
Now let's try returning an additional column, for example, the phone column. Let's run this query:
USE PUBS SELECT au_lname, au_fname, phone FROM authors WHERE au_lname = 'White'
Select the execution plan tab and note that the engine performs a clustered index scan (see the next figure). This is equivalent to a table scan.
By asking for the phone column in the output, we made the engine do extra work. It decided that a table scan would be more efficient in this case rather than using the non-clustered index aunmind. Note that if there are a large number of rows in the table, the engine may first locate row that match the au_lname in the non-clustered index and then use the row identifiers to get the data from the table. In this case the number of rows was small, so it decided to bypass the non-clustered index and directly did a clustered index scan to get the data. (The second example shows that the SQL engine took a different approach.) But in any case, the engine would have to do extra work to fetch the phone column.
If you frequently need the phone column in the output, you want to make the query more efficient. In previous versions of SQL Server, you would use an indexed view (a.k.a. materialized view or virtual table). You would create a shadow table with three columns: au_lname, au_fname, and phone. SQL Server would use the shadow table for the above query and would never have had to go to the authors table to get any data (because all required columns were available in the shadow table).
But indexed views are usually harder to plan, create, and maintain (or, in other words, they're easy to mess up). There are also a number of restrictions on creating an indexed viewthe most restrictive is that the index must be unique. So in the previous scenario, if the three columns au_lname, au_fname, and phone are not unique, you're out of luck.
SQL Server 2005 offers you an amazing alternative: You can simply include the phone column as a non-key column in the non-clustered index on the au_lname, au_fname columns. As a result, the index now becomes a covered index, and the engine does not have to go to the table to get any data. This can significantly speed up your queries, especially for large tables and for queries that return many rows.
Let's see how to do this. In the Object Explorer window, locate the aunmind index, right-click, and select Properties. Then select the Included Columns property and add the phone column (see the following figure). Click OK to exit the dialog box.
Now run the query again:
USE PUBS SELECT au_lname, au_fname, phone FROM authors WHERE au_lname = 'White'
Notice that the execution plan is back to the Index Seek operation! And wasn't it easy to do? (This makes it a good candidate for abusesee the next paragraph.)
It is very easy to abuse this feature by adding one gazillion columns as included columns, which will make your insert and update operations very inefficient. So don't tell your rookie developer about this feature. SQL Server 2005 allows up to 1023 columns to be included as non-key columns. The CREATE INDEX and ALTER INDEX columns now have an INCLUDE clause to let you add these columns.
Also note that to achieve full coverage of the index, you need to add all columns in the query's SELECT, WHERE, and ORDER BY clauses.
Here's another example of this feature: I work a lot with a patient database, in which one of the tables that holds patient data looks like this:
patient_id |
Unique per patient |
fname |
Patient first name |
lname |
Patient last name |
... |
Numerous other columns holding data on the patient: date of birth, gender, allergies, blood type, last hospital visit, and so on |
The table has about one-half million rows and will grow about 5 percent every month. Because it is frequently used, it is important that the queries be efficient.
Most of the queries on this table are focused on getting the patient's name by using the patient_id. So I simply create a non-clustered index on the patient_id column and add the fname and lname as the included columns. This ensures that all such queries are covered in the index itself. But wait, this gets interesting...I also have a large percentage of queries that are interested in finding patients on their lname and/or fname combination and returning their patient_id. So I simply create another non-clustered index on the lname, fname columns and add the patient_id as the included column!
Here are the execution plans before and after the columns are included for the first index. Query:
SELECT lname, fname FROM patients WHERE patient_id = 71119613
The following figure shows the execution plan without the included columns in the index having the patient_id as the key column (note that the Bookmark Lookup task is taking up a big chunk of work).
The following figure shows the execution plan after addition of the lname and fname as the included columns.
If the queries return a large number of rows and also have an ORDER BY clause, an indexed view might be a better choice because the indexed view will have the data physically sorted in the required order and can return the data more quickly.