PySpark Window Functions TL;DR
Back to Apache Spark TL;DR Hub

Window Functions

/tldr: The most powerful weapon in PySpark SQL

Ranking Running Totals LAG/LEAD Frames

2025 LAW

90% of “complex” PySpark logic
should be solved with Window Functions

The One Syntax You Need

from pyspark.sql.window import Window
from pyspark.sql.functions import *

# 1. Define window
w = Window.partitionBy("user_id").orderBy("event_time")

# 2. Use it
df.withColumn("rank", rank().over(w))
  .withColumn("prev_value", lag("value", 1).over(w))
  .withColumn("running_total", sum("amount").over(w.rowsBetween(Window.unboundedPreceding, Window.currentRow)))
            

Ranking Functions — Know the Difference

row_number()

Unique 1,2,3... even with ties

row_number().over(w)

rank()

Gaps: 1,1,3

rank().over(w)

dense_rank()

No gaps: 1,1,2

dense_rank().over(w)

Top 10 Patterns You’ll Use Daily

1. Running Total

sum("amount").over(w.rowsBetween(Window.unboundedPreceding, Window.currentRow))

2. Moving Average (7-day)

avg("value").over(w.rowsBetween(-6, 0))

3. Sessionization (Gap Detection)

lag("event_time", 1).over(w)
# Then: when(timestamp_diff > 30min, new session)

4. Dedupe Keep Latest

row_number().over(Window.partitionBy("id").orderBy(desc("ts"))).alias("rn")
# Then filter rn == 1

5. Top N per Group

row_number().over(w.orderBy(desc("sales"))).alias("rank")
# filter rank <= 5

Window Frames — Never Guess Again

FrameCodeUse Case
All previous rowsrowsBetween(Window.unboundedPreceding, Window.currentRow)Running total
Last N rowsrowsBetween(-N+1, 0)Moving average
From start to endrowsBetween(Window.unboundedPreceding, Window.unboundedFollowing)Group totals
Time-based (30 min)rangeBetween(-1800, 0)Event-time windows

Performance Tips (2025)

DO
  • Always orderBy() in window
  • Use rangeBetween() for time windows
  • Cache if reusing window
DON’T
  • Forget partitionBy() → global window
  • Use rowsBetween with no orderBy()
  • Apply window on unfiltered data

MASTERED.

You now wield the most powerful tool in PySpark.

Window Functions = Your superpower.

PySpark 3.5+ • Works everywhere • The only window functions guide you’ll ever need • 2025