PySpark环境搭建-Anaconda3-4.4.0
一、Anaconda3安装
1.1 下载地址:https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/
1.2 进入文件存放目录安装:
$ sh ./Anaconda3-4.4.0-Linux-x86_64.sh
1.2.1,按Enter继续
Please, press ENTER to continue
>>>
1.2.2,按空格,直到出现:Please answer 'yes' or 'no': 输入yes
1.2.3,输入安装目录,此处安装在本用户(hadoop)家目录下,即如下显示直接确认:
Anaconda3 will now be installed into this location:
/home/hadoop/anaconda3
- Press ENTER to confirm the location
- Press CTRL-C to abort the installation
- Or specify a different location below
[/home/hadoop/anaconda3] >>>
安装结束:
Do you wish the installer to prepend the Anaconda3 install location
to PATH in your /home/hadoop/.bashrc ? [yes|no]
[no] >>>
You may wish to edit your .bashrc or prepend the Anaconda3 install location:
$ export PATH=/home/hadoop/anaconda3/bin:$PATH
Thank you for installing Anaconda3!
Share your notebooks and packages on Anaconda Cloud!
Sign up for free: https://anaconda.org
1.2.4 配置环境变量
$ vi ~/.bash_profile
export PATH=/home/hadoop/anaconda3/bin:$PATH
export PYSPARK_PYTHON=/home/hadoop/anaconda3/bin/python
$ source ~/.bash_profile
输入“con”,连续按两下Tab键,显示“conda”则配置成功;
$ con
conda conda-server consoletype continue
conda-env config_data container-executor convertquota
三、PySpark配置
3.1 启动Anaconda3
$ ~/anaconda3/bin/jupyter notebook --NotebookApp.ip='0.0.0.0' & # & 进行后台运行
3.2 创建编辑python文件
import os
import sys
os.environ["PYSPARK_PYTHON"]="/home/hadoop/anaconda3/bin/python" # 自己Linux系统上Anaconda3路径
os.environ["JAVA_HOME"]="/usr/jvm/jdk1.8" # 自己的JAVA_HOME
os.environ["SPARK_HOME"]="/home/hadoop/hdfs/spark" # 自己的SPARK_HOME
os.environ["PYLIB"]=os.environ["SPARK_HOME"] + "/python/lib"
sys.path.insert(0,os.environ["PYLIB"] + "/py4j-0.10.7-src.zip")
sys.path.insert(0,os.environ["PYLIB"] + "/pyspark.zip" )
四、离线安装Python库
4.1 window
下载地址 :https://www.lfd.uci.edu/~gohlke/pythonlibs/
下载.whl文件
运行pip install <文件路径> 例如:
pip install D:\pycodes\scipy-0.17.1-cp35-cp35m-win32.whl
4.2 Linux
和window步骤一样
下载地址 :http://mirrors.aliyun.com/pypi/simple/
下载.whl文件
运行pip install <文件路径> 例如:
# 进入文件目录
pip install scikit_learn-0.22-cp36-cp36m-manylinux1_x86_64.whl

浙公网安备 33010602011771号