Window Functions
/tldr: Performing calculations on a set of related rows without collapsing them.
THE CORE IDEA: Aggregate without Grouping
Window functions allow you to run aggregate calculations (like SUM or AVG) or ranking operations on a related subset of rows—the "window"—but they do not collapse the result set like GROUP BY does. Every original row remains in the output, and the calculation result is added as a new column for each row.
GROUP BY: Many Rows → One Summary Row
WINDOW: Many Rows → Many Rows + Calculation
The Engine: The OVER Clause
Every window function must be followed by OVER (...), which defines the boundaries of the window the function will operate on.
FUNCTION(...) OVER (
[PARTITION BY column_list]
[ORDER BY column_list]
[ROWS/RANGE frame_definition]
)
PARTITION BY
Divides the data into non-overlapping groups (the 'windows'). The function restarts its calculation for each new partition. (Similar to `GROUP BY`).
ORDER BY
Defines the sequence within the current partition. Crucial for ranking, running totals, and LAG/LEAD functions.
ROWS/RANGE
Defines the "frame"—the exact set of rows within the partition that the function should act upon (e.g., the last 3 rows, or all rows up to the current one).
Type 1: Ranking Functions
Used to assign a rank or sequential number to rows within each partition.
-
ROW_NUMBER()
Assigns a unique, sequential integer to each row starting at 1. Doesn't account for ties.
-
RANK()
Assigns the same rank to ties, but skips the next rank number (e.g., 1, 2, 2, 4).
-
DENSE_RANK()
Assigns the same rank to ties, but does NOT skip the next rank number (e.g., 1, 2, 2, 3).
-- Goal: Find the highest paid employee in each department.
SELECT
employee_name,
salary,
department,
DENSE_RANK() OVER (
PARTITION BY department -- Group employees by department
ORDER BY salary DESC -- Rank them by salary, descending
) AS rank_in_dept
FROM
employees;
-- Filter the result: WHERE rank_in_dept = 1
Type 2: Analytic Functions (LAG/LEAD)
These help compare values across time or sequence, common in time-series analysis.
LAG(column, offset, default)
Retrieves the value of a column from a previous row in the partition (offset defaults to 1).
LEAD(column, offset, default)
Retrieves the value of a column from a subsequent row in the partition (offset defaults to 1).
-- Goal: Calculate the month-over-month sales difference
SELECT
sale_date,
revenue,
LAG(revenue, 1, 0) OVER (ORDER BY sale_date) AS previous_month_revenue,
revenue - LAG(revenue, 1, 0) OVER (ORDER BY sale_date) AS revenue_change
FROM
monthly_sales;
Type 3: Aggregate Functions with Windows
Running Totals (The Frame)
By adding the `ROWS` clause, you can define a *moving frame* for aggregates. The common use case is calculating a running total or a moving average.
SELECT
transaction_date,
amount,
-- Calculates the sum of all 'amount' rows up to the current row, ordered by date
SUM(amount) OVER (
PARTITION BY customer_id
ORDER BY transaction_date
ROWS UNBOUNDED PRECEDING -- Start from the first row in the partition
-- AND CURRENT ROW -- End at the current row (This is the default, often omitted)
) AS running_total
FROM
transactions;
Start slicing and comparing your data.