This article shows how to 'delete' column from Spark data frame using Python.
Construct a dataframe
Follow article Convert Python Dictionary List to PySpark DataFrame to construct a dataframe.
+----------+---+------+
| Category| ID| Value|
+----------+---+------+
|Category A| 1| 12.40|
|Category B| 2| 30.10|
|Category C| 3|100.01|
+----------+---+------+
'Delete' or 'Remove' one column
The word 'delete' or 'remove' can be misleading as Spark is lazy evaluated.
We can use dropfunction to remove or delete columns from a DataFrame.
df1 = df.drop('Category')
df1.show()
Output:
+---+------+
| ID| Value|
+---+------+
| 1| 12.40|
| 2| 30.10|
| 3|100.01|
+---+------+
Drop multiple columns
Multiple columns can be dropped at the same time:
df2 = df.drop('Category', 'ID')
df2.show()
columns_to_drop = ['Category', 'ID']
df3 = df.drop(*columns_to_drop)
df3.show()
Output:
+------+
| Value|
+------+
| 12.40|
| 30.10|
|100.01|
+------+
+------+
| Value|
+------+
| 12.40|
| 30.10|
|100.01|
+------+
Run Spark code
You can easily run Spark code on your Windows or UNIX-alike (Linux, MacOS) systems. Follow these articles to setup your Spark environment if you don't have one yet: