安装spark与python练习

一、安装Spark

1.检查基础环境hadoop,jdk

2.配置文件

3.环境变量

4.在spark中试运行python代码

 

 

 

 二、Python编程练习:英文文本的词频统计

1.准备文本文件

 

 2.源代码

import string
from os import path
with open(r'F:\大学\大三下\大数据\test.txt','rb') as input_text:
words = [word.strip(string.punctuation).lower() for word in str(input_text.read()).split()]
words_index = set(words)
count_dict = {index:words.count(index) for index in words_index}
for word in sorted(count_dict, key=lambda x: count_dict[x], reverse=True):
print('{}--{} times'.format(word,count_dict[word]) + '\n')

 3.输出结果

 

posted @ 2022-03-02 21:46  xhm11111  阅读(47)  评论(0)    收藏  举报