When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set

Context

When submitting a Spark application to run in a Hadoop YARN cluster, it may fail with the following error:

Exception in thread "main" org.apache.spark.SparkException: When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.
        at org.apache.spark.deploy.SparkSubmitArguments.error(SparkSubmitArguments.scala:630)
        at org.apache.spark.deploy.SparkSubmitArguments.validateSubmitArguments(SparkSubmitArguments.scala:270)
        at org.apache.spark.deploy.SparkSubmitArguments.validateArguments(SparkSubmitArguments.scala:233)
        at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:119)
        at org.apache.spark.deploy.SparkSubmit$$anon$2$$anon$3.<init>(SparkSubmit.scala:990)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.parseArguments(SparkSubmit.scala:990)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:85)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

This usually occurs when Spark or Hadoop is not configured properly.

Fix the issue

The error message also points out the approach to fix the issue too. We just need to configure those two environment variables. The following section shows two common approaches.

System level environment variables

For Windows environment, add machine or user level environment variables; for Linux environment, use EXPORT command to add environment variables to the .bashrc.

Variable name: HADOOP_CONF_DIR
Variable value: %HADOOP_HOME%\etc\hadoop (Windows) or $HADOOP_HOME/etc/hadoop (Linux).

Spark environment script file

Alternatively, we can also add the variable to Sparke environment setup script file.

For Windows environment, open file load-spark-env.cmd in Spark bin folder and add the following line:

set HADOOP_CONF_DIR=%HADOOP_HOME%\etc\hadoop

For Linux environment, open file load-spark-env.sh in Spark bin folder and add the following line:

export HADOOP_CONF_DIR=$HADOOP_HOME%/etc/hadoop

Context

Fix the issue

System level environment variables

Spark environment script file

In this article