Cluster Sizing & Tuning
/tldr: Get 2–5× cost reduction with 10 minutes of math
Executors
Cores
Memory
Autoscaling
The Golden Sizing Rule (2025)
5 cores + 20–30 GB RAM per executor
→ works 95% of the time
→ works 95% of the time
Why? HDFS block = 128–256 MB → 5 concurrent tasks → perfect GC + no OOM
Executor Sizing Cheat Sheet
| Node Type | vCPU | RAM | Executors/Node | Cores/Executor | Memory/Executor | |
|---|---|---|---|---|---|---|
| r6i.4xlarge / r7g.4xlarge | 16 | 128 GB | 3 | 5 | 38 GB | |
| r6i.8xlarge | 32 | 256 GB | 6 | 5 | 40 GB | |
| i4i.16xlarge (storage-optimized) | 64 | 512 GB | 12 | 5 | 40 GB |
Memory Breakdown per Executor
Execution Memory – shuffle, joins, aggregations (spill to 70% of heap)
Storage Memory – caching/persist (shared with execution)
User Memory – UDFs, your objects (~30% of heap)
Total Heap = 75% of container RAM
--executor-memory = round_down(0.75 × container_RAM)
Battle-Tested Configs (2024–2025)
Databricks / EMR (Recommended)
spark.executor.cores 5 spark.executor.memory 19g # for 32GB node → 3 executors spark.executor.memoryOverhead 4g spark.driver.cores 5 spark.driver.memory 10g spark.dynamicAllocation.enabled true spark.shuffle.service.enabled true
Kubernetes / Self-managed
--executor-cores 5 --executor-memory 30g --conf spark.executor.memoryOverhead=6g --conf spark.kubernetes.memoryOverheadFactor=0.2 --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true
Autoscaling That Actually Works
Min executors: 2–5 (keep warm)
Max executors: data_size_in_TB × 15–25
Scale-up threshold: 75% executor busy
Scale-down after 5–10 min idle
Enable
spark.dynamicAllocation.shuffleTracking.enabled=trueAvoid These (Cost You Millions)
- 1 core per executor → insane GC pressure
- >8 cores per executor → HDFS throughput drops
- No memoryOverhead → OOM on yarn kills
- Static clusters for bursty workloads
- Forgetting
spark.shuffle.service.enabled=truewith dynamic allocation
5 cores • 20–40 GB RAM • dynamic allocation + shuffle service = happy cluster