SQL Queries: Summarizing Data Results from a Query in SQL
Save 35% off the list price* of the related book or multi-format eBook (EPUB + MOBI + PDF) with discount code ARTICLE.
* See informit.com/terms
In this hour, you learn about SQL’s aggregate functions. You can perform a variety of useful functions with aggregate functions, such as getting the highest total of a sale or counting the number of orders processed on a given day. The real power of aggregate functions will be discussed in the next hour when you tackle the GROUP BY clause.
Aggregate Functions
Functions are keywords in SQL used to manipulate values within columns for output purposes. A function is a command normally used with a column name or expression that processes the incoming data to produce a result. SQL contains several types of functions. This hour covers aggregate functions. An aggregate function provides summarization information for a SQL statement, such as counts, totals, and averages.
The basic set of aggregate functions discussed in this hour are
- COUNT
- SUM
- MAX
- MIN
- AVG
The following query lists the employee information from the EMPLOYEES table. Note that some of the employees do not have data assigned in some of the columns. We use this data for most of this hour’s examples.
SELECT TOP 10 EMPLOYEEID,LASTNAME, CITY,STATE,PAYRATE,SALARY FROM EMPLOYEES; EMPLOYEEID LASTNAME CITY STATE PAYRATE SALARY ----------- ----------------- ------------------ ---------- -------------- -------- 1 Iner Red Dog NULL 54000.00 2 Denty Errol NH 22.24 NULL 3 Sabbah Errol NH 15.29 NULL 4 Loock Errol NH 12.88 NULL 5 Sacks Errol NH 23.61 NULL 6 Arcoraci Alexandria LA 24.79 NULL 7 Astin Espanola NM 18.03 NULL 8 Contreraz Espanola NM NULL 60000.00 9 Capito Espanola NM NULL 52000.00 10 Ellamar Espanola NM 15.64 NULL (10 row(s) affected)
COUNT
You use the COUNT function to count rows or values of a column that do not contain a NULL value. When used within a query, the COUNT function returns a numeric value. You can also use the COUNT function with the DISTINCT command to only count the distinct rows of a dataset. ALL (opposite of DISTINCT) is the default; it is not necessary to include ALL in the syntax. Duplicate rows are counted if DISTINCT is not specified. One other option with the COUNT function is to use it with an asterisk. COUNT(*) counts all the rows of a table including duplicates, regardless of whether a NULL value is contained in a column.
The syntax for the COUNT function follows:
COUNT [ (*) | (DISTINCT | ALL) ] (COLUMN NAME)
This example counts all employee IDs:
SELECT COUNT(EMPLOYEEID) FROM EMPLOYEES
This example counts only the distinct rows:
SELECT COUNT(DISTINCT SALARY)FROM EMPLOYEES
This example counts all rows for SALARY:
SELECT COUNT(ALL SALARY)FROM EMPLOYEES
This final example counts all rows of the EMPLOYEES table:
SELECT COUNT(*) FROM EMPLOYEES
COUNT(*) is used in the following example to get a count of all records in the EMPLOYEES table. There are 5,611 employees.
SELECT COUNT(*) FROM EMPLOYEES; ----------- 5611 (1 row(s) affected)
COUNT(EMPLOYEEID) is used in the next example to get a count of all the employee identification IDs that exist in the table. The returned count is the same as the last query because all employees have an identification number.
SELECT COUNT(EMPLOYEEID) FROM EMPLOYEES; ----------- 5611 (1 row(s) affected)
COUNT([STATE]) is used in the following example to get a count of all the employee records that have a state assigned. Look at the difference between the two counts. The difference is the number of employees who have NULL in the STATE column.
SELECT COUNT([STATE]) FROM EMPLOYEES; ----------- 5147 Warning: Null value is eliminated by an aggregate or other SET operation. (1 row(s) affected)
The following examples obtain a count of all salary amounts and then all the distinct salary amounts in the EMPLOYEES table.
SELECT COUNT(SALARY ) FROM EMPLOYEES; ----------- 1359 Warning: Null value is eliminated by an aggregate or other SET operation. (1 row(s) affected) SELECT COUNT(DISTINCT SALARY ) FROM EMPLOYEES; ----------- 45 Warning: Null value is eliminated by an aggregate or other SET operation. (1 row(s) affected)
The SALARY column had a lot of matching amounts, so the DISTINCT values make the counts drop dramatically.
SUM
The SUM function returns a total on the values of a column for a group of rows. You can also use the SUM function with DISTINCT. When you use SUM with DISTINCT, only the distinct rows are totaled, which might not have much purpose. Your total is not accurate in that case because rows of data are omitted.
The syntax for the SUM function follows:
SUM ([ DISTINCT ] COLUMN NAME)
This example totals the salaries:
SELECT SUM(SALARY) FROM EMPLOYEES
This example totals the distinct salaries:
SELECT SUM(DISTINCT SALARY) FROM EMPLOYEES
In the following query, the sum, or total amount, of all salary values is retrieved from the EMPLOYEES table:
SELECT SUM(SALARY) FROM EMPLOYEES; ------------------------------ 70791000.00 Warning: Null value is eliminated by an aggregate or other SET operation. (1 row(s) affected)
Observe the way the DISTINCT command in the following example skews the previous results by 68 million dollars. This is why it is rarely useful.
SELECT SUM(DISTINCT COST) FROM EMPLOYEES; ------------------------------ 2340000.00 Warning: Null value is eliminated by an aggregate or other SET operation. (1 row(s) affected)
The following query demonstrates that although some aggregate functions require numeric data, this is only limited to the type of data. Here the ZIP column of the EMPLOYEES table shows that the implicit conversion of the VARCHAR data to a numeric type is supported in Oracle:
SELECT SUM(ZIP) FROM EMPLOYEES; SUM(ZIP) ----------- 280891448
Some aggregate functions require numeric data; this is only limited to the type of data. If the data can be converted implicitly, for example, the string '12345' to an integer, then you can use the aggregate function. When you use a type of data that cannot be implicitly converted to a numeric type, such as the POSITION column, it results in an error, as in the following example:
SELECT SUM(POSITION) FROM EMPLOYEES; Msg 8117, Level 16, State 1, Line 1 Operand data type varchar is invalid for sum operator.
AVG
The AVG function finds the average value for a given group of rows. When used with the DISTINCT command, the AVG function returns the average of the distinct rows. The syntax for the AVG function follows:
AVG ([ DISTINCT ] COLUMN NAME)
The average value for all values in the EMPLOYEES table’s SALARY column is retrieved in the following example:
SELECT AVG(SALARY) FROM EMPLOYEES; ------------------------------ 52090.507726 Warning: Null value is eliminated by an aggregate or other SET operation. (1 row(s) affected)
This example returns the distinct average salary:
SELECT AVG(DISTINCT SALARY) FROM EMPLOYEES; ------------------------------ 52000.000000 Warning: Null value is eliminated by an aggregate or other SET operation. (1 row(s) affected)
The next example uses two aggregate functions in the same query. Because some employees are paid hourly and others are on salary, you want to retrieve the average value for both PAYRATE and SALARY.
SELECT AVG(PAYRATE) AS AVG_PAYRATE, AVG(SALARY) AS AVG_SALARY FROM EMPLOYEES; AVG_PAYRATE AVG_SALARY ------------------------------ ------------------------------ 18.473012 52090.507726 Warning: Null value is eliminated by an aggregate or other SET operation. (1 row(s) affected)
Notice how the use of aliases makes the output more readable with multiple aggregate values. Also remember that the aggregate function can work on any numeric data. So you can perform calculations within the parentheses of the function as well. So if you need to get the average hourly rate of salaried employees to compare to the average rate of hourly employees, you could write the following:
SELECT AVG(PAYRATE) AS AVG_PAYRATE, AVG(SALARY/2040) AS AVG_SALARY_RATE FROM EMPLOYEES; AVG_PAYRATE AVG_SALARY_RATE ------------------------------ ------------------------------ 18.473012 25.5345625 Warning: Null value is eliminated by an aggregate or other SET operation. (1 row(s) affected)
MAX
The MAX function returns the maximum value from the values of a column in a group of rows. NULL values are ignored when using the MAX function. Using MAX with the DISTINCT command is an option. However, because the maximum value for all the rows is the same as the distinct maximum value, DISTINCT is useless.
The syntax for the MAX function is
MAX([ DISTINCT ] COLUMN NAME)
The following example returns the highest SALARY in the EMPLOYEES table:
SELECT MAX(SALARY) FROM EMPLOYEES; ------------------------------ 74000.00 Warning: Null value is eliminated by an aggregate or other SET operation. (1 row(s) affected)
This example returns the highest distinct salary:
SELECT MAX(DISTINCT SALARY) FROM EMPLOYEES; ------------------------------ 74000.00 Warning: Null value is eliminated by an aggregate or other SET operation. (1 row(s) affected)
You can also use aggregate functions such as MAX and MIN (covered in the next section) on character data. In the case of these values, collation of your database comes into play again. Most commonly your database collation is set to a dictionary order, so the results are ranked according to that. For example, say you perform a MAX on the CITY column of the employees table:
SELECT MAX(CITY) AS MAX_CITY FROM EMPLOYEES; MAX_CITY ------------------------------ Zwara (1 row(s) affected)
In this instance, the function returned the largest value according to a dictionary ordering of the data in the column.
MIN
The MIN function returns the minimum value of a column for a group of rows. NULL values are ignored when using the MIN function. Using MIN with the DISTINCT command is an option. However, because the minimum value for all rows is the same as the minimum value for distinct rows, DISTINCT is useless.
The syntax for the MIN function is
MIN([ DISTINCT ] COLUMN NAME)
The following example returns the lowest SALARY in the EMPLOYEES table:
SELECT MIN(SALARY) FROM EMPLOYEES; ------------------------------ 30000.00 Warning: Null value is eliminated by an aggregate or other SET operation. (1 row(s) affected)
This example returns the lowest distinct salary:
SELECT MIN(DISTINCT SALARY) FROM EMPLOYEES; ------------------------------ 30000.00 Warning: Null value is eliminated by an aggregate or other SET operation. (1 row(s) affected)
As with the MAX function, the MIN function can work against character data and returns the minimum value according to the dictionary ordering of the data.
SELECT MIN(CITY) AS MIN_CITY FROM EMPLOYEES; MIN_CITY ------------------------------ AFB MunicipalCharleston SC (1 row(s) affected)