Setup Jupyter Notebook on Hortonworks Data Platform (HDP)
Setup Jupyter Notebook on Hortonworks Data Platform (HDP)
Jupyter Notebook is a web application that allows creating and sharing documents that contain live code, equations, visualizations and explanatory text.
A notebook is interactive, so you can executive code directly from a web browser. Jupyter supports multiple kernels with different programming languages.
My Setup
HDP 3.1.1Python v2.7x
Apache Spark 2.3.0
CentOS v7.7
Install EPEL
# wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm;
# rpm -ivh epel-release-latest-7.noarch.rpm
# rpm -ivh epel-release-latest-7.noarch.rpm
Once EPEL is enabled, prepare Python for next step.
# yum upgrade python-setuptools
Install Python package management system in order to install extra Python libraries.Install pip
#wget https://bootstrap.pypa.io/ez_setup.py -O - | python
#yum install python-pip python-wheel python-devel gcc
#yum install python-pip python-wheel python-devel gcc
Install a few basic data science related Python library
#pip install --upgrade pip wheel pandas numpy scipy scikit-learn matplotlib virtualenv
Install Jupyter Notebook:
# pip install jupyter
Setup Jupyter Notebook configuration file:
# jupyter notebook --generate-config
# mkdir -p /data/conf
# chown -R spark:hadoop /data
# cp ~/.jupyter/jupyter_notebook_config.py /data/conf/
# mkdir -p /data/conf
# chown -R spark:hadoop /data
# cp ~/.jupyter/jupyter_notebook_config.py /data/conf/
Set the Following Paths :
export SPARK_HOME="/usr/hdp/current/spark2-client"
export PYTHONPATH=$SPARK_HOME/pyspark/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/pyspark/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH
Start Jupyter notebook
jupyter notebook --config=/data/conf/jupyter_notebook_config.py --ip=JUPYTER_HOST --port=JUPYTER_PORT
No comments