Apache Spark TL;DR Cheatsheet
Learn Spark 10× faster with the cleanest, zero fluff content
Spark Fundamentals & Architecture
What is Spark?
Architecture • Cluster • Components • Executors • Jobs
Cluster Sizing
Memory tuning • Executors • Cores • Autoscaling
DataFrame vs Dataset
Catalyst • Tungsten • UDFs • Performance
Spark SQL + AQE
Plan optimizations • Adaptive execution
PySpark Essentials
PySpark Cheatsheet
Syntax reference • Example-driven
Joins in PySpark
Broadcast joins • Skew fixes
UDFs vs Pandas UDFs
Vectorized • Arrow • Performance wins
Window Functions
ranking • lead/lag • moving avg
Performance & Optimization
Shuffle Deep Dive
Exchange • Shuffle partitions • Spill
Caching Strategy
Memory vs Disk • Cache pitfalls
MLlib Basics
Feature engg • Pipelines • Eval
Streaming & Lakehouse
Structured Streaming
Watermarking • Event time
Delta Lake
ACID tables • Time travel • Z-order
Kafka + Spark
Exactly-once • Checkpointing