- 10.1 Introduction
- 10.2 Grouping on One Column
- 10.3 Grouping on Two or More Columns
- 10.4 Grouping on Expressions
- 10.5 Grouping of NULL Values
- 10.6 General Rules for the GROUP BY Clause
- 10.7 Complex Examples with GROUP BY
- 10.8 Grouping with WITH ROLLUP
- 10.9 Grouping with WITH CUBE
- 10.10 Grouping Sets
- 10.11 Grouping with ROLLUP and CUBE
- 10.12 Combining Grouping Sets
- 10.13 Answers
10.6 General Rules for the GROUP BY Clause
This section describes a number of important rules that relate to select blocks with a GROUP BY clause.
Rule 1: In Section 9.6, in Chapter 9, we gave several rules for the use of aggregation functions in the SELECT clause. We now add the following rule: If a select block does have a GROUP BY clause, any column specification specified in the SELECT clause must exclusively occur as a parameter of an aggregated function or in the list of columns given in the GROUP BY clause, or in both.
Therefore, the following statement is incorrect because the TOWN column appears in the SELECT clause, yet it is not the parameter of an aggregation function and does not occur in the list of columns by which the result is grouped.
SELECT TOWN, COUNT(*) FROM PLAYERS GROUP BY PLAYERNO
The reason for this restriction is as follows. The result of an aggregation function always consists of one value for each group. The result of a column specification on which grouping is performed also always consists of one value per group. These results are compatible. In contrast, the result of a column specification on which no grouping is performed consists of a set of values. This would not be compatible with the results of the other expressions in the SELECT clause.
Rule 2: In most examples, the expressions that are used to form groups also occur in the SELECT clause. However, that is not necessary. An expression that occurs in the GROUP BY clause can appear in the SELECT clause.
Rule 3: An expression that is used to form groups can also occur in the SELECT clause within a compound expression. We give an example next.
Example 10.11. Get the list with the different penalty amounts in cents.
SELECT CAST(AMOUNT * 100 AS SIGNED INTEGER) AS AMOUNT_IN_CENTS FROM PENALTIES GROUP BY AMOUNT
The result is:
AMOUNT_IN_CENTS --------------- 2500 3000 5000 7500 10000
Explanation: A grouping is performed on a simple expression consisting of a column name: AMOUNT. In the SELECT clause, that same AMOUNT column occurs within a compound expression. This is allowed.
This rule is followed by the fact that no matter how complex a compound expression is, if it occurs in a GROUP BY clause, it can be included in its entirety only in the SELECT clause. For example, if the compound expression PLAYERNO * 2 occurs in a GROUP BY clause, the expressions PLAYERNO * 2, (PLAYERNO * 2) - 100, and MOD(PLAYERNO * 2, 3) - 100 can occur in the SELECT clause. On the other hand, the expressions PLAYERNO, 2 * PLAYERNO, PLAYERNO * 100, and 8 * PLAYERNO * 2 are not allowed.
Rule 4: If an expression occurs twice or more in a GROUP BY clause, double expressions are simply removed. The GROUP BY clause GROUP BY TOWN, TOWN is converted to GROUP BY TOWN. Also, GROUP BY SUBSTR(TOWN,1,1), SEX, SUBSTR(TOWN,1,1) is converted to GROUP BY SUBSTR(TOWN,1,1), SEX. Therefore, it has no use for double expressions.
Rule 5: In Section 9.4, in Chapter 9, we described the cases in which the use of DISTINCT in the SELECT clause is superfluous. The rules given in that section apply to SELECT statements without a GROUP BY clause. We add a rule for SELECT statements with a GROUP BY clause: DISTINCT (if used outside an aggregation function) that is superfluous when the SELECT clause includes all the columns specified in the GROUP BY clause. The GROUP BY clause groups the rows in such a way that the columns on which they are grouped no longer contain duplicate values.
Exercise 10.12: Describe why the following statements are incorrect:
-
SELECT PLAYERNO, DIVISION FROM TEAMS GROUP BY PLAYERNO
-
SELECT SUBSTR(TOWN,1,1), NAME FROM PLAYERS GROUP BY TOWN, SUBSTR(NAME,1,1)
-
SELECT PLAYERNO * (AMOUNT + 100) FROM PENAL TIES GROUP BY AMOUNT + 100
Exercise 10.13: In which of the following statements is DISTINCT superfluous?
-
SELECT DISTINCT PLAYERNO FROM TEAMS GROUP BY PLAYERNO
-
SELECT DISTINCT COUNT(*) FROM MATCHES GROUP BY TEAMNO
-
SELECT DISTINCT COUNT(*) FROM MATCHES WHERE TEAMNO = 2 GROUP BY TEAMNO