← All Scenarios
1. From Unix Integer to Timestamp
Unix Time & Timestamp Conversions.
Bridging the gap between Unix epoch integers and human-readable temporal types.
Unix time represents the number of seconds since January 1, 1970 (UTC). In Spark, from_unixtime converts an integer to a string, while timestamp_seconds (or casting) creates a true Timestamp type.
from pyspark.sql.functions import from_unixtime, col
# Converting Unix seconds (1672531200) to a readable string format
df_readable = df.withColumn("event_time_str", from_unixtime(col("unix_ts"), "yyyy-MM-dd HH:mm:ss"))
# Direct conversion to Timestamp Type for arithmetic
df_typed = df.withColumn("event_timestamp", col("unix_ts").cast("timestamp"))
2. From Timestamp to Unix Integer
For calculations involving duration or for exporting data to systems that only support integers, unix_timestamp converts strings or timestamps back into the epoch count.
from pyspark.sql.functions import unix_timestamp
# Converting a string date to Unix seconds
df_unix = df.withColumn("seconds_since_epoch", unix_timestamp(col("date_str"), "MM/dd/yyyy"))
3. Handling Milliseconds and Microseconds
Modern systems often provide Unix time in milliseconds. Since Spark's default Unix functions expect seconds, you must divide by 1000 before converting.
# Handling 13-digit millisecond timestamps
df_ms = df.withColumn("ts", (col("unix_ms") / 1000).cast("timestamp"))
Interview Q&A
Q: Why does unix_timestamp() return NULL for some of my rows?
This usually happens when the input string does not match the format string provided (e.g., providing "2023-01-01" but using format "MM/dd/yyyy"). It can also happen if the date is invalid, like "2023-02-30".
Q: What is the impact of Timezones on Unix conversions?
Unix time is inherently UTC. However,
from_unixtime uses the session's local timezone (spark.sql.session.timeZone) to determine the string output unless otherwise specified. Always verify your cluster's timezone setting.
Q: How do you get the current Unix time?
You can use
unix_timestamp() with no arguments, or cast(current_timestamp() as long).