- 10.1 Introduction
- 10.2 Grouping on One Column
- 10.3 Grouping on Two or More Columns
- 10.4 Grouping on Expressions
- 10.5 Grouping of Null Values
- 10.6 Grouping with Sorting
- 10.7 General Rules for the GROUP BY Clause
- 10.8 The GROUP_CONCAT Function
- 10.9 Complex Examples with GROUP BY
- 10.10 Grouping with WITH ROLLUP
- 10.11 Answers
10.7 General Rules for the GROUP BY Clause
This section describes a number of important rules for select blocks with a GROUP BY clause.
Rule 1. Section 9.7 gives several rules for the use of aggregation functions in the SELECT clause. For many SQL products, the following rule applies: If a select block has a GROUP BY clause, any column specification in the SELECT clause must exclusively occur as a parameter of an aggregation function, or in the list of columns given in the GROUP BY clause, or in both. Therefore, for most products, the following statement is incorrect because the TOWN column appears in the SELECT clause, yet it is not the parameter of an aggregation function and does not occur in the list of columns by which the result is grouped.
SELECT TOWN, COUNT(*) FROM PLAYERS GROUP BY SEX
This restriction is because the result of an aggregation function always consists of one value for each group. The result of a column specification on which grouping is performed also always consists of one value per group. These results are compatible. In contrast, the result of a column specification on which no grouping is performed consists of a set of values. This would not be compatible with the results of the other expressions in the SELECT clause.
This rule does not apply for MySQL. The previous query will return the following result:
TOWN COUNT(*) --------- -------- Stratford 9 Inglewood 5
The value of the second column is understandable—it is the number of players per sex. But the answer of the first column is unexpected. Why does it show Stratford in the first row and Inglewood in the second? This is strange because, for each sex, there can be multiple towns. The answer is that MySQL itself determines the values to be returned. Those values are selected almost randomly. We can enforce this rule if we add the setting ONLY_FULL_GROUP_BY to the SQL_MODE system variable.
Therefore, we strongly recommend that you do not formulate this type of SQL statement, and instead adhere to this rule that applies to most SQL products.
- Rule 2. In most examples, the expressions used to form groups also occur in the SELECT clause. However, that is not necessary. An expression that occurs in the GROUP BY clause can appear in the SELECT clause.
- Rule 3. An expression that is used to form groups can also occur in the SELECT clause within a compound expression. See the next example.
Example 10.12: Get the list with the different penalty amounts in cents.
SELECT CAST(AMOUNT * 100 AS SIGNED INTEGER) AS AMOUNT_IN_CENTS FROM PENALTIES GROUP BY AMOUNT
The result is:
AMOUNT_IN_CENTS --------------- 2500 3000 5000 7500 10000
Explanation: A grouping is performed on a simple expression consisting of the column name AMOUNT. In the SELECT clause, that same AMOUNT column occurs within a compound expression. This is allowed.
No matter how complex a compound expression is, if it occurs in a GROUP BY clause, it can be included in its entirety only in the SELECT clause. For example, if the compound expression PLAYERNO * 2 occurs in a GROUP BY clause, the expressions PLAYERNO * 2, (PLAYERNO * 2) – 100 and MOD(PLAYERNO * 2, 3) – 100 can occur in the SELECT clause. On the other hand, the expressions PLAYERNO, 2 * PLAYERNO, PLAYERNO * 100, and 8 * PLAYERNO * 2 are not allowed.
- Rule 4. If an expression occurs more than once in a GROUP BY clause, double expressions are simply removed. The GROUP BY clause GROUP BY TOWN, TOWN is converted to GROUP BY TOWN. Also GROUP BY SUBSTR(TOWN,1,1), SEX, SUBSTR(TOWN,1,1) is converted to GROUP BY SUBSTR(TOWN,1,1), SEX.
- Rule 5. Section 9.4 described the cases in which the use of DISTINCT in the SELECT clause is superfluous. The rules given in that section apply to SELECT statements without a GROUP BY clause. A different rule exists for SELECT statements with a GROUP BY clause: DISTINCT (if used outside an aggregation function) is superfluous when the SELECT clause includes all the columns specified in the GROUP BY clause. The GROUP BY clause groups the rows in such a way that the column(s) on which they are grouped no longer contain duplicate values.
Exercise 10.12: Describe why the following statements are incorrect:
SELECT PLAYERNO, DIVISION FROM TEAMS GROUP BY PLAYERNO
SELECT SUBSTR(TOWN,1,1), NAME FROM PLAYERS GROUP BY TOWN, SUBSTR(NAME,1,1)
SELECT PLAYERNO * (AMOUNT + 100) FROM PENALTIES GROUP BY AMOUNT + 100
Exercise 10.13: In which of the following statements is DISTINCT superfluous?
SELECT DISTINCT PLAYERNO FROM TEAMS GROUP BY PLAYERNO
SELECT DISTINCT COUNT(*) FROM MATCHES GROUP BY TEAMNO
SELECT DISTINCT COUNT(*) FROM MATCHES WHERE TEAMNO = 2 GROUP BY TEAMNO