SQL GROUP BY and HAVING: The Complete Guide With Interview Examples

GROUP BY trips up more SQL candidates than almost any other concept. This guide explains exactly how aggregation works, why HAVING exists, common mistakes that cause wrong answers, and the patterns that keep appearing in data interviews.

Apr 15, 20264 min read

GROUP BY is one of those SQL clauses that looks simple on the surface — until you get a wrong answer in an interview and can not figure out why.

The reason is that GROUP BY has a few non-obvious rules that most tutorials skip. This guide covers the complete picture: how aggregation works, why HAVING exists, the mistakes that silently produce wrong results, and the patterns that keep showing up in real interviews.

1. What GROUP BY actually does

GROUP BY collapses multiple rows that share the same value(s) in the specified column(s) into a single row per group.

sql
SELECT
  country,
  COUNT(*) AS customer_count
FROM customers
GROUP BY country;

Every unique country value becomes one row. The COUNT(*) tells you how many customers are in each country.

The key rule:

Every column in SELECT must either be in GROUP BY or wrapped in an aggregate function.

This is the rule that trips most people up.

2. The SELECT column rule — explained clearly

sql
-- This WORKS: name is in GROUP BY
SELECT country, COUNT(*) AS total
FROM customers
GROUP BY country;

-- This FAILS: name is not in GROUP BY and not aggregated
SELECT country, name, COUNT(*) AS total
FROM customers
GROUP BY country;

The second query fails because after grouping by country, there are multiple name values per group. The database does not know which one to show.

The fix is either:

Add name to GROUP BY (if you want one row per country + name combo)
Remove name from SELECT
Wrap it: MAX(name) or MIN(name) (if you just need one value)

MySQL (in some configurations) allows selecting non-aggregated columns not in GROUP BY, but it picks values arbitrarily. This is dangerous. Write explicit, standard SQL.

3. Aggregation functions you need to know

Function	What it does
`COUNT(*)`	Counts all rows including NULLs
`COUNT(col)`	Counts non-NULL values in a column
`COUNT(DISTINCT col)`	Counts unique non-NULL values
`SUM(col)`	Sums non-NULL values
`AVG(col)`	Average of non-NULL values
`MIN(col)`	Smallest value
`MAX(col)`	Largest value

The COUNT(*) vs COUNT(col) distinction is a frequent interview question.

sql
SELECT
  COUNT(*) AS total_rows,
  COUNT(email) AS rows_with_email,
  COUNT(DISTINCT email) AS unique_emails
FROM customers;

If 200 customers have no email (NULL), COUNT(*) returns 1000 while COUNT(email) returns 800.

4. Grouping by multiple columns

You can group by more than one column:

sql
SELECT
  country,
  status,
  COUNT(*) AS customer_count
FROM customers
GROUP BY country, status;

This produces one row per unique (country, status) combination.

Interview question framing: "How many active vs inactive customers do we have per country?" → Group by (country, status).

5. WHERE vs HAVING — the most common confusion

WHERE filters rows before grouping. HAVING filters groups after aggregation.

sql
-- WHERE: filter individual rows before aggregation
SELECT country, COUNT(*) AS total
FROM customers
WHERE signup_date >= '2024-01-01'
GROUP BY country;

-- HAVING: filter the aggregated result
SELECT country, COUNT(*) AS total
FROM customers
GROUP BY country
HAVING COUNT(*) > 100;

The rule is simple:

Use WHERE to filter raw rows. Use HAVING to filter aggregated results.

You cannot use an aggregate function in a WHERE clause:

sql
-- This FAILS:
SELECT country, COUNT(*) AS total
FROM customers
WHERE COUNT(*) > 100
GROUP BY country;

-- This WORKS:
SELECT country, COUNT(*) AS total
FROM customers
GROUP BY country
HAVING COUNT(*) > 100;

6. Combining WHERE and HAVING

You can use both in the same query — they filter at different stages:

sql
SELECT
  country,
  COUNT(*) AS active_count
FROM customers
WHERE status = 'active'           -- filter rows first
GROUP BY country
HAVING COUNT(*) > 50               -- then filter groups
ORDER BY active_count DESC;

This gives you: countries that have more than 50 active customers, from the population of active customers only.

7. The query execution order

Understanding why HAVING exists requires knowing the order SQL is executed:

1. FROM + JOIN     — identify source rows
2. WHERE           — filter individual rows
3. GROUP BY        — collapse into groups
4. HAVING          — filter groups
5. SELECT          — compute output columns
6. ORDER BY        — sort
7. LIMIT           — limit rows returned

WHERE runs at step 2, before groups exist. HAVING runs at step 4, after groups are formed. That is why you cannot reference aggregates in WHERE.

Interview tip: Reciting this execution order — even partially — signals strong SQL fundamentals. Most candidates do not know it.

8. Grouping by expressions

You can group by a derived expression, not just a raw column:

sql
SELECT
  DATE_TRUNC('month', order_date) AS month,
  SUM(amount) AS monthly_revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;

In some databases (Postgres, BigQuery) you can reference the alias:

sql
SELECT
  DATE_TRUNC('month', order_date) AS month,
  SUM(amount) AS monthly_revenue
FROM orders
GROUP BY month
ORDER BY month;

This is a common pattern for time-series aggregation in analytics interviews.

9. Classic GROUP BY interview patterns

Pattern 1: "Find customers who placed more than 5 orders"

sql
SELECT customer_id, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 5;

Pattern 2: "For each product category, find the average order value"

sql
SELECT
  p.category,
  AVG(oi.unit_price * oi.quantity) AS avg_order_value
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
GROUP BY p.category
ORDER BY avg_order_value DESC;

Pattern 3: "Monthly revenue trend"

sql
SELECT
  DATE_TRUNC('month', order_date) AS month,
  COUNT(DISTINCT order_id) AS total_orders,
  SUM(amount) AS revenue
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;

Pattern 4: "Find duplicate emails in the users table"

sql
SELECT email, COUNT(*) AS occurrences
FROM users
GROUP BY email
HAVING COUNT(*) > 1;

This is one of the most common SQL cleaning / data quality questions in interviews.

10. Common mistakes in GROUP BY queries

Mistake 1: Selecting a column not in GROUP BY Already covered above — it either errors or returns arbitrary values.

Mistake 2: Using HAVING when WHERE is correct

sql
-- Inefficient: filters after aggregation
SELECT country, COUNT(*) AS total
FROM customers
GROUP BY country
HAVING country = 'IN';

-- Better: filters before aggregation, less work
SELECT country, COUNT(*) AS total
FROM customers
WHERE country = 'IN'
GROUP BY country;

If the filter is on a non-aggregated column, use WHERE — it is faster because it reduces rows before grouping.

Mistake 3: COUNT(*) when you need COUNT(DISTINCT)

sql
-- Counts order rows, not unique customers
SELECT COUNT(*) AS customers_who_ordered
FROM orders;

-- Correctly counts unique customers
SELECT COUNT(DISTINCT customer_id) AS customers_who_ordered
FROM orders;

11. Where to practice GROUP BY problems

The fastest way to get comfortable with GROUP BY and HAVING is to solve problems on realistic schemas where you have to think about:

Which columns belong in GROUP BY
Whether to filter with WHERE or HAVING
Which aggregate function to use

On TheQueryLab, you will find problems that test all of these patterns — including multi-table aggregations and time-series grouping that mirror what top companies actually ask.

Browse the SQL Interview Preparation Kit and filter by aggregation problems to build a solid foundation before your next interview.