安装spark与python练习
一、安装Spark
1.检查基础环境hadoop,jdk

2.配置文件

3.环境变量

4.在spark中试运行python代码


二、Python编程练习:英文文本的词频统计
1.准备文本文件

2.源代码
import string
from os import path
with open(r'F:\大学\大三下\大数据\test.txt','rb') as input_text:
words = [word.strip(string.punctuation).lower() for word in str(input_text.read()).split()]
words_index = set(words)
count_dict = {index:words.count(index) for index in words_index}
for word in sorted(count_dict, key=lambda x: count_dict[x], reverse=True):
print('{}--{} times'.format(word,count_dict[word]) + '\n')
3.输出结果

浙公网安备 33010602011771号