7.Python Spark安装
7.1Scla安装
下载:wget https://scala-lang.org/files/archive/scala-2.13.0.tgz
解压:tar xvf scala-2.13.0.tgz
移动到/usr/local 目录 :sudo mv scala-2.13.0 /usr/local/scala
设置Scala环境变量:
sudo gedit ~/.bashrc

source ~/.bashrc
启动scala

:q退出
7.2安装Spark
wget https://mirrors.tuna.tsinghua.edu.cn/apache/spark/spark-3.4.2/spark-3.4.2-bin-hadoop3.tgz
解压:tar zxf spark-3.4.2-bin-hadoop3.tgz
移动到/usr/local下:sudo mv spark-3.4.2-bin-hadoop3 /usr/local/spark/
编辑~/.bashrc:sudo gedit ~/.bashrc

source ~/.bashrc
7.3启动pyspark交互式界面
pyspark

8.5设置pyspark显示信息
cd /usr/local/spark/conf
cp log4j2.properties.template log4j2.properties
sudo gedit log4j2.properties,把橙色的地方改成WARN

7.5 创建测试文件

启动Hadoop Multi-Node Cluster,依次执行:
cp /usr/local/hadoop/LICENSE.txt ~/wordcount/input
ll ~/wordcount/input

hadoop fs -mkdir -p /user/hduser/wordcount/input
cd ~/wordcount/input
hadoop fs -copyFromLocal LICENSE.txt /user/hduser/wordcount/input
hadoop fs -ls /user/hduser/wordcount/input

7.6本地运行pyspark程序
pyspark --master local[4]

读取本地文件

读取HDFS文件

7.7在Hadoop YARN 运行pyspark
关闭hadoop安全模式

HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop pyspark --master yarn --deploy-mode client

读取HDFS文件:textFile=sc.textFile("hdfs://master:9000/user/hduser/wordcount/input/LICENSE.txt")

可以在Hadoop Web界面查看PySparkShell App

7.8构建Spark Standalone Cluster运行环境
复制模板文件:
cp /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh
编辑spark-env.sh:sudo gedit /usr/local/spark/conf/spark-env.sh

连接data1:ssh data1
创建spark目录:sudo mkdir /usr/local/spark
更改所有者为hduser:sudo chown hduser:hduser /usr/local/spark
使用scp把master的spark程序复制到data1:sudo scp -r /usr/local/spark hduser@data1:/usr/local

data2同样操作一遍。
编辑spark/conf/workers文件
cp /usr/local/spark/conf/workers.template /usr/local/spark/conf/workers
sudo gedit /usr/local/spark/conf/workers

7.9在Spark Standalone 运行 pyspark
/usr/local/spark/sbin/start-all.sh

z在Spark Standalone 运行 pyspark
pyspark --master spark://master:7077 --num-executors 1 --total-executor-cores 3 --executor-memory 512m

读取文件

7.9Spark Web UI 界面
http://master:8080/
可以查看worker和job详细信息



浙公网安备 33010602011771号