Concatenate Columns in Spark DataFrame

This code snippet provides one example of concatenating columns using a separator in Spark DataFrame. Function `concat_ws `is used directly. For Spark SQL version, refer to [Spark SQL - Concatenate w/o Separator (concat\_ws and concat)](https://kontext.tech/article/1079/spark-sql-concatenate-withwithout-separator). ### Syntax of concat\_ws ``` pyspark.sql.functions.concat_ws(sep: str, *cols: ColumnOrName) ``` Output: ``` +-----+--------+--------------+ | col1| col2| col1_col2| +-----+--------+--------------+ |Hello| Kontext| Hello,Kontext| |Hello|Big Data|Hello,Big Data| +-----+--------+--------------+ ```

Kontext Kontext 0 401 0.38 index 8/19/2022

Code description

This code snippet provides one example of concatenating columns using a separator in Spark DataFrame. Function concat_ws is used directly. For Spark SQL version, refer to Spark SQL - Concatenate w/o Separator (concat_ws and concat).

Syntax of concat_ws

    pyspark.sql.functions.concat_ws(sep: str, *cols: ColumnOrName)

Output:

    +-----+--------+--------------+
    | col1|    col2|     col1_col2|
    +-----+--------+--------------+
    |Hello| Kontext| Hello,Kontext|
    |Hello|Big Data|Hello,Big Data|
    +-----+--------+--------------+  
    

Code snippet

    from pyspark.sql import SparkSession
    from pyspark.sql.functions import concat_ws
    
    app_name = "PySpark concat_ws Example"
    master = "local"
    
    spark = SparkSession.builder         .appName(app_name)         .master(master)         .getOrCreate()
    
    spark.sparkContext.setLogLevel("WARN")
    
    # Create a DataFrame
    df = spark.createDataFrame(
        [['Hello', 'Kontext'], ['Hello', 'Big Data']], ['col1', 'col2'])
    
    # Concatenate these two columns using seperator ','
    df = df.withColumn('col1_col2', concat_ws(',', df.col1, df.col2))
    
    df.show()
    
pyspark spark-sql

Join the Discussion

View or add your thoughts below

Comments