Transact-SQL Improvements with SQL Server 2008, Part 2
- Working with Multiple Groupings of Data in a Single Command
- Passing Multiple Data Rows to a Stored Procedure with Table-Valued Parameters
In this article, we'll continue our brief look at new and changed Transact-SQL functionality in SQL Server 2008. In part 1 of this series, we considered initializing and incrementing variables, merging multiple data sets into one, separating date and time values with new data types, and some new functions for dealing with date and time. Now we'll take a look at functions for grouping, as well as how to use table-valued parameters to pass multiple data rows to a stored procedure.
Working with Multiple Groupings of Data in a Single Command
The SELECT statement's GROUP BY clause has been extended to allow for a new GROUPING SETS option, which enables you to have multiple groupings of data in a single command. For example, a quick look at the Person.Address table in the AdventureWorks sample database reveals that persons reside in different cities within multiple states. The following query shows the number of persons in each combination of city and state/province:
SELECT b.Name AS StateProvince, a.City, COUNT(*) AS NumberOfAddresses FROM Person.Address a INNER JOIN Person.StateProvince b ON a.StateProvinceID = b.StateProvinceID GROUP BY b.Name, a.City ORDER BY 3 DESC
The following table shows partial results of this query:
StateProvince |
City |
NumberOfAddresses |
England |
London |
434 |
Seine (Paris) |
Paris |
398 |
Washington |
Burien |
215 |
California |
Concord |
214 |
Washington |
Bellingham |
213 |
Oregon |
Beaverton |
213 |
California |
Chula Vista |
207 |
However, this result doesn't produce subtotals by each city and each province. I can easily modify this query to include subtotals for each city name, province name, and a combination of city and state, simply by adding the GROUPING SETS clause:
SELECT b.Name AS StateProvince, a.City, COUNT(*) AS NumberOfAddresses FROM Person.Address a INNER JOIN Person.StateProvince b ON a.StateProvinceID = b.StateProvinceID GROUP BY GROUPING SETS (b.Name, a.City, (a.City, b.Name)) ORDER BY 3 DESC
Partial results:
StateProvince |
City |
NumberOfAddresses |
California |
NULL |
4564 |
Washington |
NULL |
2636 |
England |
NULL |
1954 |
New South Wales |
NULL |
1588 |
British Columbia |
NULL |
1579 |
Oregon |
NULL |
1105 |
NULL |
London |
434 |
England |
London |
434 |
NULL |
Paris |
398 |
Seine (Paris) |
Paris |
398 |
To be fair, previous versions of SQL Server also allowed constructing a similar output, but doing so would require considerably more typing. The following query would return results identical to the one above:
SELECT b.Name AS StateProvince, NULL AS City, COUNT(*) AS NumberOfAddresses FROM Person.Address a INNER JOIN Person.StateProvince b ON a.StateProvinceID = b.StateProvinceID GROUP BY b.Name UNION ALL SELECT NULL AS StateProvince, City, COUNT(*) AS NumberOfAddresses FROM Person.Address a INNER JOIN Person.StateProvince b ON a.StateProvinceID = b.StateProvinceID GROUP BY City UNION ALL SELECT b.Name AS StateProvince, City, COUNT(*) AS NumberOfAddresses FROM Person.Address a INNER JOIN Person.StateProvince b ON a.StateProvinceID = b.StateProvinceID GROUP BY b.Name, City ORDER BY 3 DESC
In addition to saving the effort of typing a much longer statement, the GROUPING SETS clause provides considerably better performance than the equivalent statement using a UNION ALL construct. As the statistics report below shows, the SELECT command using the GROUPING SETS clause scans the Address table twice, whereas the statement using UNION ALL scans the same table three times:
Statistics report for GROUPING SETS: Table 'Address'. Scan count 2, logical reads 245, physical reads 6, read-ahead reads 240, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'StateProvince'. Scan count 2, logical reads 7, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Statistics report for UNION ALL: Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Address'. Scan count 3, logical reads 461, physical reads 9, read-ahead reads 240, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'StateProvince'. Scan count 2, logical reads 7, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Transact-SQL now also includes ANSI-standard ROLLUP and CUBE operators, which are shorthand alternatives to GROUPING SETS. For example, the following query will produce subtotals for each state/province and a grand total for all addresses:
SELECT b.Name AS StateProvince, a.City, COUNT(*) AS NumberOfAddresses FROM Person.Address a INNER JOIN Person.StateProvince b ON a.StateProvinceID = b.StateProvinceID GROUP BY ROLLUP (b.name, a.City) ORDER BY 1
Partial results:
StateProvince |
City |
NumberofAddresses |
NULL |
NULL |
19614 |
Alabama |
Birmingham |
2 |
Alabama |
Florence |
1 |
Alabama |
Huntsville |
2 |
Alabama |
Mobile |
1 |
Alabama |
Montgomery |
1 |
Alabama |
NULL |
7 |
Alberta |
Calgary |
23 |
Alberta |
Edmonton |
2 |
Alberta |
NULL |
25 |
Arizona |
Chandler |
4 |
Arizona |
Gilbert |
1 |
Arizona |
Lemon Grove |
2 |
Arizona |
Mesa |
1 |
Arizona |
Phoenix |
5 |
Arizona |
Scottsdale |
2 |
Arizona |
Surprise |
1 |
Arizona |
Tucson |
2 |
Arizona |
NULL |
18 |
The next query will produce subtotals for each city, subtotals for each state, and the grand total for all addresses:
SELECT b.Name AS StateProvince, a.City, COUNT(*) AS NumberOfAddresses FROM Person.Address a INNER JOIN Person.StateProvince b ON a.StateProvinceID = b.StateProvinceID GROUP BY CUBE (b.name, a.City) ORDER BY 1
The GROUPING_ID function allows us to identify the column(s) used to aggregate the results. To demonstrate, I'll limit the output to cities found in Alabama:
SELECT GROUPING_ID(b.Name, a.City) AS grouping_bits, b.Name AS StateProvince, a.City, COUNT(*) AS NumberOfAddresses FROM Person.Address a INNER JOIN Person.StateProvince b ON a.StateProvinceID = b.StateProvinceID WHERE b.Name = 'Alabama' GROUP BY CUBE (b.name, a.City) ORDER BY 1, 3
Results:
grouping_bits |
StateProvince |
City |
NumberofAddresses |
0 |
Alabama |
Birmingham |
2 |
0 |
Alabama |
Florence |
1 |
0 |
Alabama |
Huntsville |
2 |
0 |
Alabama |
Mobile |
1 |
0 |
Alabama |
Montgomery |
1 |
1 |
Alabama |
NULL |
7 |
2 |
NULL |
Birmingham |
2 |
2 |
NULL |
Florence |
1 |
2 |
NULL |
Huntsville |
2 |
2 |
NULL |
Mobile |
1 |
2 |
NULL |
Montgomery |
1 |
3 |
NULL |
NULL |
7 |
Notice that I passed the state column first to the GROUPING_ID function, followed by city. So the grouping bits column will have a value of 1 if results are aggregated by state, 2 if aggregated by city, and 3 (1+2) if aggregated by both. If addresses are displayed for a combination of city and state, we aren't aggregating by either column, so grouping bits will have a value of zero. Note also that the grouping bits column allowed for a very nice option for sorting the results (using the ORDER BY clause). Similarly, if the result set was large, we could materialize itpersist it as an indexed view or tableand build an index on the grouping bits column for faster data retrieval.