SQL GROUP BY and HAVING: The Complete Guide With Interview Examples
GROUP BY trips up more SQL candidates than almost any other concept. This guide explains exactly how aggregation works, why HAVING exists, common mistakes that cause wrong answers, and the patterns that keep appearing in data interviews.
GROUP BY is one of those SQL clauses that looks simple on the surface — until you get a wrong answer in an interview and can not figure out why.
The reason is that GROUP BY has a few non-obvious rules that most tutorials skip. This guide covers the complete picture: how aggregation works, why HAVING exists, the mistakes that silently produce wrong results, and the patterns that keep showing up in real interviews.
1. What GROUP BY actually does
GROUP BY collapses multiple rows that share the same value(s) in the specified column(s) into a single row per group.
sqlSELECT country, COUNT(*) AS customer_count FROM customers GROUP BY country;
Every unique country value becomes one row. The COUNT(*) tells you how many customers are in each country.
The key rule:
Every column in
SELECTmust either be inGROUP BYor wrapped in an aggregate function.
This is the rule that trips most people up.
2. The SELECT column rule — explained clearly
sql-- This WORKS: name is in GROUP BY SELECT country, COUNT(*) AS total FROM customers GROUP BY country; -- This FAILS: name is not in GROUP BY and not aggregated SELECT country, name, COUNT(*) AS total FROM customers GROUP BY country;
The second query fails because after grouping by country, there are multiple name values per group. The database does not know which one to show.
The fix is either:
- Add
nametoGROUP BY(if you want one row per country + name combo) - Remove
namefromSELECT - Wrap it:
MAX(name)orMIN(name)(if you just need one value)
MySQL (in some configurations) allows selecting non-aggregated columns not in GROUP BY, but it picks values arbitrarily. This is dangerous. Write explicit, standard SQL.
3. Aggregation functions you need to know
| Function | What it does |
|---|---|
COUNT(*) | Counts all rows including NULLs |
COUNT(col) | Counts non-NULL values in a column |
COUNT(DISTINCT col) | Counts unique non-NULL values |
SUM(col) | Sums non-NULL values |
AVG(col) | Average of non-NULL values |
MIN(col) | Smallest value |
MAX(col) | Largest value |
The COUNT(*) vs COUNT(col) distinction is a frequent interview question.
sqlSELECT COUNT(*) AS total_rows, COUNT(email) AS rows_with_email, COUNT(DISTINCT email) AS unique_emails FROM customers;
If 200 customers have no email (NULL), COUNT(*) returns 1000 while COUNT(email) returns 800.
4. Grouping by multiple columns
You can group by more than one column:
sqlSELECT country, status, COUNT(*) AS customer_count FROM customers GROUP BY country, status;
This produces one row per unique (country, status) combination.
Interview question framing: "How many active vs inactive customers do we have per country?" → Group by
(country, status).
5. WHERE vs HAVING — the most common confusion
WHERE filters rows before grouping.
HAVING filters groups after aggregation.
sql-- WHERE: filter individual rows before aggregation SELECT country, COUNT(*) AS total FROM customers WHERE signup_date >= '2024-01-01' GROUP BY country; -- HAVING: filter the aggregated result SELECT country, COUNT(*) AS total FROM customers GROUP BY country HAVING COUNT(*) > 100;
The rule is simple:
Use
WHEREto filter raw rows. UseHAVINGto filter aggregated results.
You cannot use an aggregate function in a WHERE clause:
sql-- This FAILS: SELECT country, COUNT(*) AS total FROM customers WHERE COUNT(*) > 100 GROUP BY country; -- This WORKS: SELECT country, COUNT(*) AS total FROM customers GROUP BY country HAVING COUNT(*) > 100;
6. Combining WHERE and HAVING
You can use both in the same query — they filter at different stages:
sqlSELECT country, COUNT(*) AS active_count FROM customers WHERE status = 'active' -- filter rows first GROUP BY country HAVING COUNT(*) > 50 -- then filter groups ORDER BY active_count DESC;
This gives you: countries that have more than 50 active customers, from the population of active customers only.
7. The query execution order
Understanding why HAVING exists requires knowing the order SQL is executed:
1. FROM + JOIN — identify source rows
2. WHERE — filter individual rows
3. GROUP BY — collapse into groups
4. HAVING — filter groups
5. SELECT — compute output columns
6. ORDER BY — sort
7. LIMIT — limit rows returned
WHERE runs at step 2, before groups exist. HAVING runs at step 4, after groups are formed. That is why you cannot reference aggregates in WHERE.
Interview tip: Reciting this execution order — even partially — signals strong SQL fundamentals. Most candidates do not know it.
8. Grouping by expressions
You can group by a derived expression, not just a raw column:
sqlSELECT DATE_TRUNC('month', order_date) AS month, SUM(amount) AS monthly_revenue FROM orders GROUP BY DATE_TRUNC('month', order_date) ORDER BY month;
In some databases (Postgres, BigQuery) you can reference the alias:
sqlSELECT DATE_TRUNC('month', order_date) AS month, SUM(amount) AS monthly_revenue FROM orders GROUP BY month ORDER BY month;
This is a common pattern for time-series aggregation in analytics interviews.
9. Classic GROUP BY interview patterns
Pattern 1: "Find customers who placed more than 5 orders"
sqlSELECT customer_id, COUNT(*) AS order_count FROM orders GROUP BY customer_id HAVING COUNT(*) > 5;
Pattern 2: "For each product category, find the average order value"
sqlSELECT p.category, AVG(oi.unit_price * oi.quantity) AS avg_order_value FROM order_items oi JOIN products p ON oi.product_id = p.product_id GROUP BY p.category ORDER BY avg_order_value DESC;
Pattern 3: "Monthly revenue trend"
sqlSELECT DATE_TRUNC('month', order_date) AS month, COUNT(DISTINCT order_id) AS total_orders, SUM(amount) AS revenue FROM orders WHERE order_date >= '2024-01-01' GROUP BY DATE_TRUNC('month', order_date) ORDER BY month;
Pattern 4: "Find duplicate emails in the users table"
sqlSELECT email, COUNT(*) AS occurrences FROM users GROUP BY email HAVING COUNT(*) > 1;
This is one of the most common SQL cleaning / data quality questions in interviews.
10. Common mistakes in GROUP BY queries
Mistake 1: Selecting a column not in GROUP BY Already covered above — it either errors or returns arbitrary values.
Mistake 2: Using HAVING when WHERE is correct
sql-- Inefficient: filters after aggregation SELECT country, COUNT(*) AS total FROM customers GROUP BY country HAVING country = 'IN'; -- Better: filters before aggregation, less work SELECT country, COUNT(*) AS total FROM customers WHERE country = 'IN' GROUP BY country;
If the filter is on a non-aggregated column, use WHERE — it is faster because it reduces rows before grouping.
Mistake 3: COUNT(*) when you need COUNT(DISTINCT)
sql-- Counts order rows, not unique customers SELECT COUNT(*) AS customers_who_ordered FROM orders; -- Correctly counts unique customers SELECT COUNT(DISTINCT customer_id) AS customers_who_ordered FROM orders;
11. Where to practice GROUP BY problems
The fastest way to get comfortable with GROUP BY and HAVING is to solve problems on realistic schemas where you have to think about:
- Which columns belong in
GROUP BY - Whether to filter with
WHEREorHAVING - Which aggregate function to use
On TheQueryLab, you will find problems that test all of these patterns — including multi-table aggregations and time-series grouping that mirror what top companies actually ask.
Browse the SQL Interview Preparation Kit and filter by aggregation problems to build a solid foundation before your next interview.
