When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set

Raymond Tang Raymond Tang 1 7648 4.87 index 3/8/2021

Context

When submitting a Spark application to run in a Hadoop YARN cluster, it may fail with the following error:

Exception in thread "main" org.apache.spark.SparkException: When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.
        at org.apache.spark.deploy.SparkSubmitArguments.error(SparkSubmitArguments.scala:630)
        at org.apache.spark.deploy.SparkSubmitArguments.validateSubmitArguments(SparkSubmitArguments.scala:270)
        at org.apache.spark.deploy.SparkSubmitArguments.validateArguments(SparkSubmitArguments.scala:233)
        at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:119)
        at org.apache.spark.deploy.SparkSubmit$$anon$2$$anon$3.<init>(SparkSubmit.scala:990)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.parseArguments(SparkSubmit.scala:990)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:85)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

This usually occurs when Spark or Hadoop is not configured properly.

Fix the issue

The error message also points out the approach to fix the issue too. We just need to configure those two environment variables. The following section shows two common approaches.

System level environment variables

For Windows environment, add machine or user level environment variables; for Linux environment, use EXPORT command to add environment variables to the .bashrc.

  • Variable name: HADOOP_CONF_DIR
  • Variable value: %HADOOP_HOME%\etc\hadoop (Windows) or $HADOOP_HOME/etc/hadoop (Linux).

Spark environment script file

Alternatively, we can also add the variable to Sparke environment setup script file.

For Windows environment, open fileĀ load-spark-env.cmd in Spark bin folder and add the following line:

set HADOOP_CONF_DIR=%HADOOP_HOME%\etc\hadoop

For Linux environment, open fileĀ load-spark-env.sh in Spark bin folder and add the following line:

export HADOOP_CONF_DIR=$HADOOP_HOME%/etc/hadoop
spark yarn

Join the Discussion

View or add your thoughts below

Comments