SelectFromData
AnalyticsLesson 4 of 9 · 25 min

Counting and Summing: GROUP BY and the Aggregate Functions

So far every query returned rows that already existed. Aggregation is different: it collapses many rows into a summary: a count, a total, an average. This is where SQL starts answering business questions like "how many?" and "how much per department?"

In one line

Aggregate functions (COUNT, SUM, AVG, MIN, MAX) reduce many rows to one number. Add GROUP BY to get one number per category instead of one for the whole table.

One number for the whole table

Without GROUP BY, an aggregate function summarises every row into a single value. Run it:

Whole-table summary: how many employees, and the total + average salary
Editable · runs in your browser

The five you'll use constantly: COUNT (how many), SUM (total), AVG (mean), MIN and MAX (smallest / largest).

GROUP BY — one number per category

Picture it first: GROUP BY department sorts the rows into coloured groups, then collapses each group into a single summary row. Two engineers become one Engineering row; two salespeople become one Sales row:

namedepartmentsalary
AdaEngineering62000
BoSales48000
CyEngineering71000
DiSales52000
EliMarketing45000
GROUP BY department
departmentheadcountavg_salary
Engineering266500
Sales250000
Marketing145000
Same-colour rows share a group; each group folds into one result row (COUNT and AVG computed per group).

Try it now: produce the grouped table the visual showed, the headcount per department (department and headcount).

Your turn · Exercise

Return the headcount per department (columns department and headcount).

The data

employees

5 rows returned

namedepartmentsalary
AdaEngineering62,000
BoSales48,000
CyEngineering71,000
DiSales52,000
EliMarketing45,000
Runs in your browser

GROUP BY department splits the table into groups and runs the aggregate once per group. The rule of thumb: every column in your SELECT is either inside an aggregate function or listed in GROUP BY. Run it:

One row per department: headcount and average salary
Editable · runs in your browser

Watch out

If you SELECT a column that's neither aggregated nor in GROUP BY, the query is ambiguous: "which name should I show for the Engineering group, there are two?" Most databases error; some pick one arbitrarily. The fix is always: aggregate it, or add it to GROUP BY.

Your turn

Group and aggregate. Return the total salary per department, highest total first.

Your turn · Exercise

Return the total salary per department (columns department and total_salary), highest total first.

The data

employees

5 rows returned

namedepartmentsalary
AdaEngineering62,000
BoSales48,000
CyEngineering71,000
DiSales52,000
EliMarketing45,000
Runs in your browser

COUNT(*) vs COUNT(column) vs COUNT(DISTINCT)

These three look alike and mean different things:

ExpressionCounts
COUNT(*)Every row, including ones with NULLs
COUNT(col)Only rows where col is NOT NULL
COUNT(DISTINCT col)Distinct non-NULL values of col

The gap between COUNT(*) and COUNT(col) is exactly your NULL count.

Run it and watch the three numbers differ on the same column:

bonus has 2 NULLs and a repeated value — see how each COUNT reacts
Editable · runs in your browser

From practice

SUM, AVG, MIN, MAX all silently skip NULLs too. That's usually what you want, but if NULL means "zero" in your data, the average will be higher than you expect because the NULL rows aren't counted in the denominator. Know your data.


Next: filtering on those aggregated numbers (which needs HAVING, not WHERE) and the order SQL actually runs your clauses in.