Use when() and otherwise() with PySpark DataFrame

In Spark SQL, CASE WHEN clause can be used to evaluate a list of conditions and to return one of the multiple results for each column. The same can be implemented directly using `pyspark.sql.functions.when` and `pyspark.sql.Column.otherwise` functions. If `otherwise `is not used together with `when`, None will be returned for unmatched conditions. Output: ``` +---+------+ | id|id_new| +---+------+ | 1| 1| | 2| 200| | 3| 3000| | 4| 400| | 5| 5| | 6| 600| | 7| 7| | 8| 800| | 9| 9000| +---+------+ ```

Kontext Kontext 0 3023 2.91 index 8/25/2022

Code description

In Spark SQL, CASE WHEN clause can be used to evaluate a list of conditions and to return one of the multiple results for each column. The same can be implemented directly using pyspark.sql.functions.when and pyspark.sql.Column.otherwise functions. If otherwise is not used together with when, None will be returned for unmatched conditions. 

Output:

    +---+------+
    | id|id_new|
    +---+------+
    |  1|     1|
    |  2|   200|
    |  3|  3000|
    |  4|   400|
    |  5|     5|
    |  6|   600|
    |  7|     7|
    |  8|   800|
    |  9|  9000|
    +---+------+  
    

Code snippet

    from pyspark.sql import SparkSession
    from pyspark.sql.functions import when
    
    appName = "PySpark when and otherwise Example"
    master = "local"
    
    # Create Spark session
    spark = SparkSession.builder         .appName(appName)         .master(master)         .getOrCreate()
    
    spark.sparkContext.setLogLevel("WARN")
    
    df = spark.range(1, 10)
    df = df.withColumn('id_new', when(df.id % 2 == 0, df.id *
                                      100).when(df.id % 3 == 0, df.id*1000).otherwise(df.id))
    df.show()
pyspark

Join the Discussion

View or add your thoughts below

Comments