Install Hadoop 3.0.0 on Windows (Single Node)

Install Hadoop 3.0.0 on Windows (Single Node)

Raymond Tang Raymond Tang 1 59497 22.30 index 2/25/2018

This page summarizes the steps to install Hadoop 3.0.0 on your Windows environment. Reference page:

https://wiki.apache.org/hadoop/Hadoop2OnWindows

https://hadoop.apache.org/docs/r1.2.1/cluster_setup.html

infoA newer version of installation guide for latest Hadoop 3.2.1 is available. I recommend using that to install as it has a number of new features. Refer to the following article for more details. 

Install Latest Hadoop 3.2.1 on Windows 10 Step by Step Guide

Tools and Environment

  • GIT Bash
  • Command Prompt
  • Windows 10

Download Binary Package

Download the latest binary from the following site:

http://hadoop.apache.org/releases.html

In my case, I am saving the file to folder: F:\DataA nalytics

UnZip binary package

Open Git Bash, and change directory (cd) to the folder where you save the binary package and then unzip:

$ cd F:\DataAnalyticsfahao@Raymond-Alienware MINGW64 /f/DataAnalytics$ tar -xvzf  hadoop-3.0.0.tar.gz

In my case, the Hadoop binary is extracted to: F:\DataAnalytics\hadoop-3.0.0

Setup environment variables

Make sure the following environment variables are set correctly:

  • JAVA_HOME: pointing to your Java JDK installation folder.
  • HADOOP_HOME: pointing to your Hadoop folder in the previous step.

https://api.kontext.tech/resource/71744d7a-08f8-51fb-b857-148f3da2d6f6

Then add ‘%JAVA_HOME%/bin’ and ‘%HADOOP_HOME%/bin’ into Path environment variable like the following screenshot:

https://api.kontext.tech/resource/9fa8ca7e-fe95-53db-a690-e6fdbafc60c0

Verify your setup

You should be able to verify your settings via the following command:

F:\DataAnalytics\hadoop-3.0.0>hadoop -version
java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)

HDFS configurations

Edit file hadoop-env.cmd

Change this file in %HADOOP_HOME%/etc/hadoop directory to add the following lines at the end of file:

set HADOOP_PREFIX=%HADOOP_HOME%
set HADOOP_CONF_DIR=%HADOOP_PREFIX%\etc\hadoop
set YARN_CONF_DIR=%HADOOP_CONF_DIR%
set PATH=%PATH%;%HADOOP_PREFIX%\bin

Edit file core-site.xml

Make sure the following configurations are existing:

<configuration>   <property>     <name>fs.defaultFS</name>     <value>hdfs://0.0.0.0:19000</value>   </property>
</configuration>

By default, the above property configuration doesn’t exist.

Edit file hdfs-site.xml

Make sure the following configurations are existing (you can change the file path to your own paths):

<configuration>   <property>     <name>dfs.replication</name>     <value>1</value>   </property>   <property>     <name>dfs.name.dir</name>     <value>file:///F:/DataAnalytics/dfs/namespace_logs</value>   </property>   <property>     <name>dfs.data.dir</name>     <value>file:///F:/DataAnalytics/dfs/data</value>   </property>
</configuration>

The above configurations setup the HFDS locations for storing namespace, logs and data files.

Edit file workers

Ensure the following content is existing:

localhost

YARN configurations

Edit file **mapred-site.xml**

Edit mapred-site.xml under %HADOOP_HOME%\etc\hadoop and add the following configuration, replacing %USERNAME% with your Windows user name.

<configuration>   <property>      <name>mapreduce.job.user.name</name>      <value>%USERNAME%</value>    </property>   <property>      <name>mapreduce.framework.name</name>      <value>yarn</value>    </property>  <property>     <name>yarn.apps.stagingDir</name>     <value>/user/%USERNAME%/staging</value>   </property>  <property>     <name>mapreduce.jobtracker.address</name>     <value>local</value>   </property></configuration>

Edit file **yarn-site.xml**

Make sure the following entries are existing:

<configuration>   <property>     <name>yarn.server.resourcemanager.address</name>     <value>0.0.0.0:8020</value>   </property>  <property>     <name>yarn.server.resourcemanager.application.expiry.interval</name>     <value>60000</value>   </property>  <property>     <name>yarn.server.nodemanager.address</name>     <value>0.0.0.0:45454</value>   </property>  <property>     <name>yarn.nodemanager.aux-services</name>     <value>mapreduce_shuffle</value>   </property>  <property>     <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>     <value>org.apache.hadoop.mapred.ShuffleHandler</value>   </property>  <property>     <name>yarn.server.nodemanager.remote-app-log-dir</name>     <value>/app-logs</value>   </property>  <property>     <name>yarn.nodemanager.log-dirs</name>     <value>/dep/logs/userlogs</value>   </property>  <property>     <name>yarn.server.mapreduce-appmanager.attempt-listener.bindAddress</name>     <value>0.0.0.0</value>   </property>  <property>     <name>yarn.server.mapreduce-appmanager.client-service.bindAddress</name>     <value>0.0.0.0</value>   </property>  <property>     <name>yarn.log-aggregation-enable</name>     <value>true</value>   </property>  <property>     <name>yarn.log-aggregation.retain-seconds</name>     <value>-1</value>   </property>  <property>     <name>yarn.application.classpath</name>     <value>%HADOOP_CONF_DIR%,%HADOOP_COMMON_HOME%/share/hadoop/common/*,%HADOOP_COMMON_HOME%/share/hadoop/common/lib/*,%HADOOP_HDFS_HOME%/share/hadoop/hdfs/*,%HADOOP_HDFS_HOME%/share/hadoop/hdfs/lib/*,%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/*,%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/lib/*,%HADOOP_YARN_HOME%/share/hadoop/yarn/*,%HADOOP_YARN_HOME%/share/hadoop/yarn/lib/*</value>   </property>
</configuration>

Initialize environment variables

Run hadoop-env.cmd to setup environment variables. For my case, the file path is:

%HADOOP_HOME%\etc\hadoop\hadoop-env.cmd

Format file system

Run the following command to format the file system:

hadoop namenode -format

The command should print out some logs like the following (the highlighted path may vary base on your HDFS configurations):

2018-02-18 21:29:41,501 INFO namenode.FSImage: Allocated new BlockPoolId: BP-353327356-172.24.144.1-1518949781495 2018-02-18 21:29:41,817 INFO common.Storage: Storage directory F:\DataAnalytics\dfs\namespace_logs has been successfully formatted. 2018-02-18 21:29:41,826 INFO namenode.FSImageFormatProtobuf: Saving image file F:\DataAnalytics\dfs\namespace_logs\current\fsimage.ckpt_0000000000000000000 using no compression 2018-02-18 21:29:41,934 INFO namenode.FSImageFormatProtobuf: Image file F:\DataAnalytics\dfs\namespace_logs\current\fsimage.ckpt_0000000000000000000 of size 390 bytes saved in 0 seconds. 2018-02-18 21:29:41,969 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0

Start HDFS daemons

Run the following command to start the NameNode and DataNode on localhost.

%HADOOP_HOME%\sbin\start-dfs.cmd

The above command line will open two Command Prompt Windows: one for namenode and another for datanode.

https://api.kontext.tech/resource/87d4fbc3-bc17-533d-9eef-ba77964af8d9

To verify, let’s copy a file to HDFS:

%HADOOP_HOME%\bin\hdfs dfs -put file:///F:/DataAnalytics/test.txt /And then list the files in HDFS:%HADOOP_HOME%\bin\hdfs dfs -ls /

You should get some result similiar to the following screenshot:

https://api.kontext.tech/resource/92122cf0-3ec9-5cef-a9ab-12c0521a1941

Start YARN daemons

Start YARN through the following command:

%HADOOP_HOME%\sbin\start-yarn.cmd

Similar to HDFS, two windows will open:

https://api.kontext.tech/resource/28dfa8c5-dff4-52ad-90df-e5112b2ade90

To verify, we can run the following sample job to count word count:

%HADOOP_HOME%\bin\yarn jar %HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-examples-3.0.0.jar wordcount /test.txt /out

https://api.kontext.tech/resource/a44b2d44-c107-51d8-a597-c1c89b6297fd

Web UIs

Resource manager

You can also view your job status through YRAN website. The default path is http://localhost:8088

https://api.kontext.tech/resource/816293ca-7964-5a23-add1-82bbbc4817e6https://api.kontext.tech/resource/01ecc7f3-1982-5609-9fc5-284a6335ed51

NameNode UI

Default URL: http://localhost:9870 https://api.kontext.tech/resource/16d56521-4125-5f04-a79d-3deab7356bb1 https://api.kontext.tech/resource/f9ca8ded-25e1-517b-af9a-d1a64fa9bb03

DataNode UI

Through name node, you can find out all the data nodes. For my case, i only have single data node with UI URL as http://localhost:9864

https://api.kontext.tech/resource/3348cea1-55c4-5163-9239-372debc83cc6

Errors and fixes

java.io.FileNotFoundException: Could not locate Hadoop executable: … \hadoop-3.0.0\bin\winutils.exe

Refer to the following page to fix the problem:

https://wiki.apache.org/hadoop/WindowsProblems

java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)

This error is the same as the above one.

Refer to ‘Windows binaries for Hadoop versions (built from the git commit ID used for the ASF relase) ‘

https://github.com/steveloughran/winutils

For this example, I am using Hadoop 3.0.0.

https://github.com/steveloughran/winutils/tree/master/hadoop-3.0.0/bin

To fix it, copy over the above directory to %HADOOP_HOME%/bin.

big-data-on-windows-10 hadoop hdfs yarn

Join the Discussion

View or add your thoughts below

Comments