Code description
This code snippets shows you how to add or subtract milliseconds (or microseconds) and seconds from a timestamp column in Spark DataFrame.
It first creates a DataFrame in memory and then add and subtract milliseconds/seconds from the timestamp column ts
using Spark SQL internals.
Output:
+---+--------------------------+--------------------------+--------------------------+--------------------------+
|id |ts |ts1 |ts2 |ts3 |
+---+--------------------------+--------------------------+--------------------------+--------------------------+
|1 |2022-09-01 12:05:37.227916|2022-09-01 12:05:37.226916|2022-09-01 12:05:37.228916|2022-09-01 12:05:38.227916|
|2 |2022-09-01 12:05:37.227916|2022-09-01 12:05:37.226916|2022-09-01 12:05:37.228916|2022-09-01 12:05:38.227916|
|3 |2022-09-01 12:05:37.227916|2022-09-01 12:05:37.226916|2022-09-01 12:05:37.228916|2022-09-01 12:05:38.227916|
|4 |2022-09-01 12:05:37.227916|2022-09-01 12:05:37.226916|2022-09-01 12:05:37.228916|2022-09-01 12:05:38.227916|
+---+--------------------------+--------------------------+--------------------------+--------------------------+
*Note - the code assuming SparkSession object already exists via variable name spark
.
Code snippet
from pyspark.sql.functions import *
import datetime
now = datetime.datetime.now()
df = spark.range(1,5)
df = df.withColumn('ts', lit(now))
df = df.withColumn('ts1', expr("ts - interval '0.001' seconds"))
df = df.withColumn('ts2', expr("ts + interval '0.001' seconds"))
df = df.withColumn('ts3', expr("ts + interval '1' seconds"))
df.show(truncate=False)