Unix Time & Timestamp Conversions | Spark Practical Scenarios
← All Scenarios

Unix Time & Timestamp Conversions.

Bridging the gap between Unix epoch integers and human-readable temporal types.

Unix time represents the number of seconds since January 1, 1970 (UTC). In Spark, from_unixtime converts an integer to a string, while timestamp_seconds (or casting) creates a true Timestamp type.

from pyspark.sql.functions import from_unixtime, col

# Converting Unix seconds (1672531200) to a readable string format
df_readable = df.withColumn("event_time_str", from_unixtime(col("unix_ts"), "yyyy-MM-dd HH:mm:ss"))

# Direct conversion to Timestamp Type for arithmetic
df_typed = df.withColumn("event_timestamp", col("unix_ts").cast("timestamp"))
    

For calculations involving duration or for exporting data to systems that only support integers, unix_timestamp converts strings or timestamps back into the epoch count.

from pyspark.sql.functions import unix_timestamp

# Converting a string date to Unix seconds
df_unix = df.withColumn("seconds_since_epoch", unix_timestamp(col("date_str"), "MM/dd/yyyy"))
    

Modern systems often provide Unix time in milliseconds. Since Spark's default Unix functions expect seconds, you must divide by 1000 before converting.

# Handling 13-digit millisecond timestamps
df_ms = df.withColumn("ts", (col("unix_ms") / 1000).cast("timestamp"))
    
Q: Why does unix_timestamp() return NULL for some of my rows? This usually happens when the input string does not match the format string provided (e.g., providing "2023-01-01" but using format "MM/dd/yyyy"). It can also happen if the date is invalid, like "2023-02-30".
Q: What is the impact of Timezones on Unix conversions? Unix time is inherently UTC. However, from_unixtime uses the session's local timezone (spark.sql.session.timeZone) to determine the string output unless otherwise specified. Always verify your cluster's timezone setting.
Q: How do you get the current Unix time? You can use unix_timestamp() with no arguments, or cast(current_timestamp() as long).