2.安装Spark与Python练习

一、安装Spark

1.检查基础环境hadoop,jdk

java -version //jdk
hadoop -version //hadoop

 

2、下载spark

https://archive.apache.org/dist/spark/spark-3.2.0/spark-3.2.0-bin-without-hadoop.tgz

3、解压,文件夹重命名、权限

sudo tar -zxvf ~/VMOS_share_DockerOS/spark-3.2.0-bin-without-hadoop.tgz -C /usr/local/          //解压
sudo mv /usr/local/spark-3.2.0-bin-without-hadoop /usr/local/spark  //改名
sudo chown -R hadoop /usr/local/spark      //授权

 

 

 

4、配置文件

配置spark的classpath

cd /usr/local/spark
cp ./conf/spark-env.sh.template ./conf/spark-env.sh   #拷贝配置文件

在文件中加入

export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)

 

 

 

 

5、环境变量

 

 

 

 

 在gedit ~/.bashrc 插入以下代码

export SPARK_HOME=/usr/local/spark
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.9-src.zip:PYTHONPATH
export PYSPARK_PYTHON=python3
export PATH=$PATH:$SPARK_HOME/bin

运行

source ~/.bashrc

 

6、试运行Python代码

 

 

 

 

 二、Python编程练习:英文文本的词频统计

mkdir wc       //创建文件夹
cd wc
gedit 1.txt

输入下列英文(1.txt)

//1.txt
In the house where I grew up, it was our custom to leave the front door on the latch at night. I don't know if that was a local term or if it is universal; "on the latch" meant the door was closed but not locked. None of us carried keys; the last one in for the evening would close up, and that was it.

输入代码(wc.py)

gedit wc.py         
path='/home/hadoop/wc/1.txt'
with open(path) as f:
    text=f.read()
words = text.split()
wc={}
for word in words:
    wc[word]=wc.get(word,0)+1
wclist=list(wc.items())
wclist.sort(key=lambda x:x[1],reverse=True)
print(wclist)

运行

python3 wc.py

 

 输入下列英文(2.txt)

gedit 2.txt
Do not pray for tasks equal to your powers.Pray for powers equal to your tasks.Then the doing of work shall be no miracle,but you shall be the miracle.

输入以下代码(wc.py)

path='/home/hadoop/wc/2.txt'
with open(path) as f:
    text=f.read()
words = text.split()
wc={}
for word in words:
    wc[word]=wc.get(word,0)+1
wclist=list(wc.items())
wclist.sort(key=lambda x:x[1],reverse=True)
print(wclist)

 

 

posted @ 2022-03-04 16:16  噗噗柚子√  阅读(30)  评论(0)    收藏  举报