Apache Spark — Quiz Hub
Curated quizzes covering Spark Core, RDDs, DataFrames, Streaming, Architecture, ML & more. Pick a learning path and start testing your knowledge to master Spark.
Featured Beginner Path
A quick onboarding flow for new learners — start here to build your foundation.
Introduction to Apache Spark
History, ecosystem, core features, and advantages over Hadoop MapReduce
Spark Core & RDD Fundamentals
RDD basics, transformations, actions, partitions, persistence & resilience
DataFrames & Datasets
Schema, logical/physical plans, encoders, and the essentials of lazy evaluation
Spark SQL and Catalyst Optimizer
Intro to Spark SQL, DataFrame queries, Catalyst optimizers fundamentals
Core & RDD
Deep dive into RDD internals, partitioning, lineage, and advanced transformations.
Spark RDD — Advanced Concepts
Custom partitioning, accumulators, broadcast variables, and failure recovery mechanisms
Spark Architecture
Execution internals: driver, executors, DAG, shuffle, and Tungsten engine.
Spark Architecture Basics
Driver, Executors, Cluster Manager, Application, Job, Stage, Task & the DAG
Advanced Spark Architecture
Executor memory model, shuffle internals, Tungsten, and task serialization
Deployment & Cluster
Deployment modes, cluster managers, spark-submit, and best practices.
Spark Application & Deployment
spark-submit, client vs cluster mode, job stages, YARN/K8s/Standalone
Spark on Kubernetes
Native K8s mode, dynamic allocation, shuffle service, pod templates
DataFrames & Data APIs
Schema, encoders, DataFrame operations, and best practices for analytics.
DataFrames & Datasets
Schema, logical/physical plans, encoders, lazy evaluation for performance
Advanced DataFrame Transformations
Window functions, complex aggregations, UDFs vs built-ins performance
Spark SQL
SQL, Catalyst, query plans, predicate pushdown and join strategies.
Spark SQL & Catalyst Optimizer
SQL functions, logical/physical plans, projection, and predicate pushdown
SQL Query Tuning & Join Strategies
Broadcast joins, shuffle joins, partition pruning & Adaptive Query Execution (AQE)
Streaming
DStreams and Structured Streaming — real-time processing essentials.
Spark Streaming (DStreams)
Batch intervals, window operations, checkpointing, and receivers
Structured Streaming
Event-time, watermarks, triggers, stateful processing, and output modes
Graph Processing (GraphX)
GraphX fundamentals for graph algorithms and message passing.
GraphX Fundamentals
Pregel API, EdgeTriplet, aggregateMessages, and connected components
Machine Learning (MLlib)
ML pipelines, transformers, model selection, and scalable algorithms.
MLlib – ML Pipelines
Transformers, Estimators, Pipeline, CrossValidator, and Evaluators
Delta Lake & Lakehouse
ACID tables, MERGE, time travel, schema evolution and best practices.
Delta Lake & Lakehouse
ACID transactions, MERGE, time travel, schema evolution, and OPTIMIZE
Optimization & Tuning
Partitions, caching, AQE, memory tuning, skew handling and shuffle strategies.
Spark Configs & Performance
Partition tuning, broadcast, caching, Adaptive Query Execution (AQE), skew handling
Spark Shuffle & Tungsten Engine
Hash vs Sort shuffle, whole-stage code generation, off-heap memory management