← All Scenarios
1. When and Otherwise
Conditional Logic & Dynamic Expressions.
Implementing branching logic using when-otherwise, SQL Case statements, and raw expressions.
The when() function is the programmatic equivalent of an "if" statement. You can chain multiple when() calls to simulate "else if," and finish with otherwise() for the "else" catch-all.
from pyspark.sql.functions import col, when, lit
# Categorizing customers based on spending
df_segmented = df.withColumn("segment",
when(col("total_spend") >= 1000, lit("Platinum"))
.when(col("total_spend") >= 500, lit("Gold"))
.otherwise(lit("Silver"))
)
2. SQL Style: CASE WHEN
If you are more comfortable with SQL syntax or are working within spark.sql(), you can use the standard CASE WHEN block.
# SQL approach
df.createOrReplaceTempView("sales")
result = spark.sql("""
SELECT *,
CASE
WHEN region = 'US' THEN 'North America'
WHEN region = 'UK' THEN 'Europe'
ELSE 'Other'
END as geography
FROM sales
""")
3. Power User: The expr() function
The expr() function allows you to write SQL-like strings directly inside DataFrame transformations. This is incredibly useful for complex logic that is cumbersome to write using the Column API.
from pyspark.sql.functions import expr
# Combining strings and logic in a single expression
df_dynamic = df.withColumn("discount_info",
expr("CASE WHEN price > 100 THEN concat('Save ', price * 0.1) ELSE 'No Discount' END")
)
Interview Q&A
Q: What happens if you don't provide an otherwise()?
If no conditions in the when() chain are met and there is no otherwise(), Spark will return null for that row.
Q: Why use expr() instead of regular functions?
expr() is highly readable for complex logic and allows you to access SQL functions that might not be explicitly exposed in the PySpark
functions module. It also makes porting logic from SQL-heavy environments much easier.
Q: Can you use conditional logic for filtering?
Yes, while when() is for creating column values, you can use similar boolean logic inside filter() or where() to subset data based on dynamic conditions.