SQL TL;DR: Subqueries & CTEs (DQL)
Back to SQL TL;DR Hub

Subqueries & CTEs (WITH)

/tldr: Breaking complex queries into logical, sequential steps.

Complex Logic Advanced DQL Readability & Recursion

THE CORE IDEA: Querying a Query

When you need to use the result of one SELECT statement as input for another, you use a subquery or a CTE. This pattern is fundamental to solving multi-step analytical problems, such as "Find the employees whose salary is above the average salary of their department."

Step 1 (Inner Query) Step 2 (Outer Query)

1. Subqueries (Nested Queries)

Subqueries are standard queries placed within parentheses and embedded in a larger query (the outer query).

Types of Subqueries

  • Scalar Subquery

    Returns a single value (one row, one column). Can be used anywhere a single value is expected (SELECT, WHERE, HAVING).

  • Multi-Row Subquery

    Returns one column but multiple rows. Used with operators like IN, ANY, or ALL in the WHERE clause.

  • Inline View (FROM Clause)

    Returns multiple columns and rows. Used in the FROM clause, acting like a temporary table for the outer query.

  • Example (Scalar Subquery)

    -- Find employees whose salary is above the company average
    SELECT
        employee_name,
        salary
    FROM
        employees
    WHERE
        salary > (
            -- Scalar Subquery calculates single value (avg_salary)
            SELECT AVG(salary) FROM employees
        );
                        

    2. Common Table Expressions (CTEs / WITH)

    CTEs are named, temporary result sets defined within the execution scope of a single SELECT, INSERT, UPDATE, or DELETE statement. They are the modern, cleaner way to manage complex, multi-step queries.

    Syntax and Benefits

    CTEs improve readability because you define the logic steps *before* the main query, eliminating deep nesting.

    -- Define CTE(s) first
    WITH
        DepartmentAverages AS (
            SELECT department, AVG(salary) AS avg_salary
            FROM employees
            GROUP BY department
        )
    
    -- Execute the main query using the CTE as if it were a table
    SELECT
        e.employee_name,
        e.salary,
        a.avg_salary
    FROM
        employees e
    JOIN
        DepartmentAverages a ON e.department = a.department
    WHERE
        e.salary > a.avg_salary;
                    

    Advanced Topics

    Correlated Subqueries (Caution)

    An inner query that depends on the outer query for its values. It runs once for **every single row** of the outer query. **Warning:** They are powerful but often much slower than CTEs or JOINs, especially on large datasets.

    WHERE salary > (
        SELECT AVG(salary)
        FROM employees e2
        -- Correlated to the outer query (e)
        WHERE e2.department = e.department
    )
                            

    Recursive CTEs

    A specialized CTE that can reference itself. Used primarily for querying hierarchical data structures (e.g., organizational charts, bill of materials, network paths).

    (Requires WITH RECURSIVE syntax and a termination condition to prevent infinite loops.)

    3. Combining Results: UNION vs. UNION ALL

    These set operators combine the result sets of two or more independent SELECT statements into a single result.

    UNION (Distinct)

    Combines results and automatically removes duplicate rows. This deduplication process adds overhead, making it slower.

    UNION ALL (Raw)

    Combines all results, including all duplicate rows. This is generally much faster, so use it unless you explicitly need deduplication.

    Mandatory Rule: All queries being combined must have the same number of columns and the columns must have compatible data types, in the same order.
    SELECT first_name, last_name, 'Customer' AS source FROM customers
    UNION ALL
    SELECT first_name, last_name, 'Employee' AS source FROM employees
    ORDER BY last_name;
                

    Build complex logic step-by-step.

    When in doubt, use a CTE—your future self will thank you for the readability.