This post summarizes the steps to install Zeppelin 0.7.3 in Windows environment.
Tools and Environment
- GIT Bash
- Command Prompt
- Windows 10
Download Binary Package
Download the latest binary package from the following website:
http://zeppelin.apache.org/download.html
In my case, I am saving the file to folder: F:\DataAnalytics
UnZip Binary Package
Open Git Bash, and change directory (cd) to the folder where you save the binary package and then unzip:
$ cd F:\DataAnalytics
fahao@Raymond-Alienware MINGW64 /f/DataAnalytics $ tar -xvzf zeppelin-0.7.3-bin-all.gz
After running the above commands, the package is unzip to folder: F:\DataAnalytics\zeppelin-0.7.3-bin-all
Run Zeppelin
Before starting Zeppelin, make sure JAVA_HOME environment variable is set.
JAVA\_HOME environment variable
JAVA_HOME environment variable value should be your Java JRE path.
https://api.kontext.tech/resource/e055281c-82b0-5006-a25c-60318c78535f
Start Zeppelin
Run the following command in Command Prompt (Remember to the path to your own Zeppelin folder):
cd /D F:\DataAnalytics\zeppelin-0.7.3-bin-all\bin
F:\DataAnalytics\zeppelin-0.7.3-bin-all\bin>zeppelin.cmd
Wait until Zeppelin server is started:
https://api.kontext.tech/resource/8e7eccb8-89b5-5f02-a2c4-5385ed76dfb9
Verify
In any of your browser, navigate to http://localhost:8080/
The UI should looks like the following screenshot:
https://api.kontext.tech/resource/028d8beb-3314-5be3-9d9e-66d877fd289e
Create Notebook
Create a simple note using markdown and then run it:
https://api.kontext.tech/resource/1f8c5f31-edca-5199-af30-dbfb73aac5a5
java.lang.NullPointerException
If you got this error when using Spark as interpreter, please refer to the following pages for details:
https://issues.apache.org/jira/browse/ZEPPELIN-2438
https://issues.apache.org/jira/browse/ZEPPELIN-2475
Basically, even you configure Spark interpreter not to use Hive, Zeppelin is still trying to locate winutil.exe through environment variableHADOOP_HOME.
Thus to resolve the problem, you need to install Hadoop in your local system and then add one environment variable:
https://api.kontext.tech/resource/4381dde5-722a-5ca4-b602-c922e5b8aa75
After the environment variable is added, please restart the whole Zeppelin server and then you should be able to run Spark successfully.
https://api.kontext.tech/resource/7447b6dc-21a3-58d4-af73-515ac2b59162
You should also be able to run the tutorials provided as part of the installation:
https://api.kontext.tech/resource/089153fb-2644-553e-85ba-0ad3e9d3ae28
org.apache.zeppelin.interpreter.InterpreterException:
If you encounter the following error:
org.apache.zeppelin.interpreter.InterpreterException: The filename, directory name, or volume label syntax is incorrect.
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterManagedProcess.start(RemoteInterpreterManagedProcess.java:143) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.reference(RemoteInterpreterProcess.java:73) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:265) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:430) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:111) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:387) at org.apache.zeppelin.scheduler.Job.run(Job.java:175) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
It is probably caused by the same issue in this JIRA task if you have installed Spark locally:
https://issues.apache.org/jira/browse/ZEPPELIN-2677
To fix it, you can remove ‘SPARK_HOME’ environment variable and your Spark should still be able to run correctly if you run spark shell using full path of spark-shell.cmd.