Subqueries & CTEs (WITH)
/tldr: Breaking complex queries into logical, sequential steps.
THE CORE IDEA: Querying a Query
When you need to use the result of one SELECT statement as input for another, you use a subquery or a CTE. This pattern is fundamental to solving multi-step analytical problems, such as "Find the employees whose salary is above the average salary of their department."
Step 1 (Inner Query) → Step 2 (Outer Query)
1. Subqueries (Nested Queries)
Subqueries are standard queries placed within parentheses and embedded in a larger query (the outer query).
Types of Subqueries
Scalar Subquery
Returns a single value (one row, one column). Can be used anywhere a single value is expected (SELECT, WHERE, HAVING).
Multi-Row Subquery
Returns one column but multiple rows. Used with operators like IN, ANY, or ALL in the WHERE clause.
Inline View (FROM Clause)
Returns multiple columns and rows. Used in the FROM clause, acting like a temporary table for the outer query.
Example (Scalar Subquery)
-- Find employees whose salary is above the company average
SELECT
employee_name,
salary
FROM
employees
WHERE
salary > (
-- Scalar Subquery calculates single value (avg_salary)
SELECT AVG(salary) FROM employees
);
2. Common Table Expressions (CTEs / WITH)
CTEs are named, temporary result sets defined within the execution scope of a single SELECT, INSERT, UPDATE, or DELETE statement. They are the modern, cleaner way to manage complex, multi-step queries.
Syntax and Benefits
CTEs improve readability because you define the logic steps *before* the main query, eliminating deep nesting.
-- Define CTE(s) first
WITH
DepartmentAverages AS (
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
)
-- Execute the main query using the CTE as if it were a table
SELECT
e.employee_name,
e.salary,
a.avg_salary
FROM
employees e
JOIN
DepartmentAverages a ON e.department = a.department
WHERE
e.salary > a.avg_salary;
Advanced Topics
Correlated Subqueries (Caution)
An inner query that depends on the outer query for its values. It runs once for **every single row** of the outer query. **Warning:** They are powerful but often much slower than CTEs or JOINs, especially on large datasets.
WHERE salary > (
SELECT AVG(salary)
FROM employees e2
-- Correlated to the outer query (e)
WHERE e2.department = e.department
)
Recursive CTEs
A specialized CTE that can reference itself. Used primarily for querying hierarchical data structures (e.g., organizational charts, bill of materials, network paths).
(Requires WITH RECURSIVE syntax and a termination condition to prevent infinite loops.)
3. Combining Results: UNION vs. UNION ALL
These set operators combine the result sets of two or more independent SELECT statements into a single result.
UNION (Distinct)
Combines results and automatically removes duplicate rows. This deduplication process adds overhead, making it slower.
UNION ALL (Raw)
Combines all results, including all duplicate rows. This is generally much faster, so use it unless you explicitly need deduplication.
SELECT first_name, last_name, 'Customer' AS source FROM customers
UNION ALL
SELECT first_name, last_name, 'Employee' AS source FROM employees
ORDER BY last_name;
Build complex logic step-by-step.