PySpark DataFrame - Add or Subtract Milliseconds from Timestamp Column

This code snippets shows you how to add or subtract milliseconds (or microseconds) and seconds from a timestamp column in Spark DataFrame. It first creates a DataFrame in memory and then add and subtract milliseconds/seconds from the timestamp column `ts `using Spark SQL internals. Output: ``` +---+--------------------------+--------------------------+--------------------------+--------------------------+ |id |ts |ts1 |ts2 |ts3 | +---+--------------------------+--------------------------+--------------------------+--------------------------+ |1 |2022-09-01 12:05:37.227916|2022-09-01 12:05:37.226916|2022-09-01 12:05:37.228916|2022-09-01 12:05:38.227916| |2 |2022-09-01 12:05:37.227916|2022-09-01 12:05:37.226916|2022-09-01 12:05:37.228916|2022-09-01 12:05:38.227916| |3 |2022-09-01 12:05:37.227916|2022-09-01 12:05:37.226916|2022-09-01 12:05:37.228916|2022-09-01 12:05:38.227916| |4 |2022-09-01 12:05:37.227916|2022-09-01 12:05:37.226916|2022-09-01 12:05:37.228916|2022-09-01 12:05:38.227916| +---+--------------------------+--------------------------+--------------------------+--------------------------+ ``` \*Note - the code assuming SparkSession object already exists via variable name `spark`.

Kontext Kontext 0 3340 3.24 index 9/1/2022

Code description

This code snippets shows you how to add or subtract milliseconds (or microseconds) and seconds from a timestamp column in Spark DataFrame.

It first creates a DataFrame in memory and then add and subtract milliseconds/seconds from the timestamp column ts using Spark SQL internals. 

Output:

    +---+--------------------------+--------------------------+--------------------------+--------------------------+
    |id |ts                        |ts1                       |ts2                       |ts3                       |
    +---+--------------------------+--------------------------+--------------------------+--------------------------+
    |1  |2022-09-01 12:05:37.227916|2022-09-01 12:05:37.226916|2022-09-01 12:05:37.228916|2022-09-01 12:05:38.227916|
    |2  |2022-09-01 12:05:37.227916|2022-09-01 12:05:37.226916|2022-09-01 12:05:37.228916|2022-09-01 12:05:38.227916|
    |3  |2022-09-01 12:05:37.227916|2022-09-01 12:05:37.226916|2022-09-01 12:05:37.228916|2022-09-01 12:05:38.227916|
    |4  |2022-09-01 12:05:37.227916|2022-09-01 12:05:37.226916|2022-09-01 12:05:37.228916|2022-09-01 12:05:38.227916|
    +---+--------------------------+--------------------------+--------------------------+--------------------------+  
    

*Note - the code assuming SparkSession object already exists via variable name spark

Code snippet

    from pyspark.sql.functions import *
    import datetime
    
    now = datetime.datetime.now()
    df = spark.range(1,5)
    df = df.withColumn('ts', lit(now))
    df = df.withColumn('ts1', expr("ts - interval '0.001' seconds"))
    df = df.withColumn('ts2', expr("ts + interval '0.001' seconds"))
    df = df.withColumn('ts3', expr("ts + interval '1' seconds"))
    df.show(truncate=False)
pyspark python spark

Join the Discussion

View or add your thoughts below

Comments