PySpark DataFrame Fill Null Values with fillna or na.fill Functions

In PySpark, DataFrame.fillna, DataFrame.na.fill and DataFrameNaFunctions.fill are alias of each other. We can use them to fill null values with a constant value. For example, replace all null integer columns with value 0, etc. Output: ``` +--------------+-------+--------+ | str_col|int_col|bool_col| +--------------+-------+--------+ |Hello Kontext!| 100| true| |Hello Context!| 0| null| +--------------+-------+--------+ +--------------+-------+--------+ | str_col|int_col|bool_col| +--------------+-------+--------+ |Hello Kontext!| 100| true| |Hello Context!| null| false| +--------------+-------+--------+ +--------------+-------+--------+ | str_col|int_col|bool_col| +--------------+-------+--------+ |Hello Kontext!| 100| true| |Hello Context!| 0| false| +--------------+-------+--------+ ```

Kontext Kontext 0 989 0.95 index 8/18/2022

Code description

In PySpark, DataFrame.fillna, DataFrame.na.fill and DataFrameNaFunctions.fill are alias of each other. We can use them to fill null values with a constant value. For example, replace all null integer columns with value 0, etc. 

Output:

    +--------------+-------+--------+
    |       str_col|int_col|bool_col|
    +--------------+-------+--------+
    |Hello Kontext!|    100|    true|
    |Hello Context!|      0|    null|
    +--------------+-------+--------+
    
    +--------------+-------+--------+
    |       str_col|int_col|bool_col|
    +--------------+-------+--------+
    |Hello Kontext!|    100|    true|
    |Hello Context!|   null|   false|
    +--------------+-------+--------+
    
    +--------------+-------+--------+
    |       str_col|int_col|bool_col|
    +--------------+-------+--------+
    |Hello Kontext!|    100|    true|
    |Hello Context!|      0|   false|
    +--------------+-------+--------+  
    

Code snippet

    from pyspark.sql import SparkSession
    
    app_name = "PySpark fillna"
    master = "local"
    
    spark = SparkSession.builder         .appName(app_name)         .master(master)         .getOrCreate()
    
    spark.sparkContext.setLogLevel("WARN")
    
    # Create a DataFrame
    df = spark.createDataFrame(
        [['Hello Kontext!', 100, True], ['Hello Context!', None, None]],
        ['str_col', 'int_col', 'bool_col'])
    
    # Only fill integer columns
    df.fillna(0).show()
    
    # Only fill boolean columns
    df.fillna(False).show()
    
    # Fill both at the same time
    df.fillna({'int_col': 0, 'bool_col': False}).show()
    
pyspark