[AWS-Hadoop|Hive|Spark] jupyter 설치 후 pyspark와 연동
| System Structure
목차
1. Jupyter Notebook 설치하기
2. Jupyter 패스워드 설정
3. Jupyter 환경설정
4. Pyspark - Jupyter 연동
5. Jupyter 서버 종료
1. Jupyter Notebook 설치하기
[hadoop@client ~]$ pip install jupyter
[hadoop@client ~]$ pip install py4j
2. Jupyter 패스워드 설정
Jupyter에 사용할 패스워드 암호화값을 만들기 위하여 iypthon으로 들어갑니다.
[hadoop@client ~]$ ipython
Python 3.9.14 (main, Nov 7 2022, 00:00:00)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.7.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from notebook.auth import passwd
In [2]: passwd()
Enter password: 암호화하고싶은_passwd 나는 jupyter로 했음
Verify password: jupyter
'argon2:$argon2id$v=19$m=10240,t=10,p=8$2HtQUOQ1wUXL+RKvgdvEbw$JbJR+IgDCjyjHzHWJfMhdZxfKVXu4ZP4ba6LLQRzf3k' --이것을 잘 복사해둡니다.
exit() 로 나올수있다.
3. Jupyter 환경설정
[hadoop@client ~]$ jupyter notebook --generate-config
Writing default config to: /home/hadoop/.jupyter/jupyter_notebook_config.py
[hadoop@client ~]$ vim /home/hadoop/.jupyter/jupyter_notebook_config.py
3가지 부분 수정
c.NotebookApp.allow_origin = '*'
c.NotebookApp.open_browser = False
c.NotebookApp.password = 'argon2:$argon2id$v=19$m=10240,t=10,p=8$2HtQUOQ1wUXL+RKvgdvEbw$JbJR+IgDCjyjHzHWJfMhdZxfKVXu4ZP4ba6LLQRzf3k'
c.NotebookApp.notebook_dir = '/home/hadoop/workspace'
[hadoop@client ~]$ mkdir workspace
4. Pyspark - Jupyter 연동
[hadoop@client ~]$ vi ~/.bashrc
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook --ip=0.0.0.0'
[hadoop@client ~]$ source ~/.bashrc
[hadoop@client ~]$ pyspark
[I 07:59:33.016 NotebookApp] Writing notebook server cookie secret to /home/hadoop/.local/share/jupyter/runtime/notebook_cookie_secret
[I 07:59:33.204 NotebookApp] Serving notebooks from local directory: /home/hadoop/workspace
[I 07:59:33.204 NotebookApp] Jupyter Notebook 6.5.2 is running at:
[I 07:59:33.204 NotebookApp] http://client:8888/ --> 이것으로 접속하면 된다.
[I 07:59:33.205 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
^C[I 07:59:45.018 NotebookApp] interrupted
Serving notebooks from local directory: /home/hadoop/workspace
0 active kernels
Jupyter Notebook 6.5.2 is running at:
http://client:8888/
http://client_publicDNSorIP:8888/
http://ec2-13-209-76-244.ap-northeast-2.compute.amazonaws.com:8888/
위에서 설정한 jupyter_notebook_config.py에서 설정한 c.NotebookApp.password를 입력한다.
나는 jupyter였다.
5. Jupyter 서버 종료
Ctrl + c 누르면 notebook server를 shutdown 할것이냐고 뜬다. y를 입력하면된다.
Shutdown this notebook server (y/[n])? y
5.1 hadoop/resouece manage, job history 서버 종료
[hadoop@Namenode ~]$ stop-dfs.sh
[hadoop@Namenode ~]$ mapred --daemon stop historyserver
[hadoop@secondnode ~]$ stop-yarn.sh