PySpark: some questions for a beginner

LL luc londea 0 114 0.15 index 11/30/2022

Hello i'm trying to understand how Spark wprks and I'm learning PySpark.

I know know Python and the Pandas library.

I understand that if I want to read a big cvs file with Pandas usin dataframe, it may not work (or it will take a long time to read).

As such PySpark is an alternative.

I read some artcicles and I understaoof the first thing to do is to create a SparkContext.

I understant the SparkContext will manage the cluster which will read the csv file and transform datas.

So I hade this code in a juptyter notebook # Import de SparkContext du module pysparkfrom pyspark import SparkContextsc = SparkContext('local')sc

if i execute this code twice,t he 2nd time I will get an error because I cant' have 2 spark contexts. Why can't i have 2 sparks contexts?

I xanted to try this: # Import de SparkContext du module pysparkfrom pyspark import SparkContextsc1 = SparkContext('local')

`sc2 = SparkContext('local')``` I have 2 different names: sc1 and sc2. Een id i execute only one time, I have an error. Why cant' I have 2 sparks context sc1and sc2? thank you


Join the Discussion

View or add your thoughts below

Comments