安装Spark与Python练习

一、安装Spark

1.检查基础环境hadoop,jdk

2.下载spark

由于上学期已经下载好了Spark，这里没有下载过程的截图

3.配置文件

4.配置环境变量

5.运行

二、Python编程练习：英文文本的词频统计

# CalHamletV1.py
def getText():
    txt = open("hamlet.TXT", "r", encoding='UTF-8').read()  # 打开文件
    txt = txt.lower()  # 全部转成小写
    for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_‘{|}~':
        txt = txt.replace(ch, " ")  # 将文本中特殊字符替换为空格
    return txt
hamletTxt = getText()
words = hamletTxt.split()
counts = {}
for word in words:
    counts[word] = counts.get(word, 0) + 1
items = list(counts.items())
items.sort(key=lambda x: x[1], reverse=True)  # 排序
i=1
while i <= len(items):
        word, count = items[i - 1]
        print("{0:<20}{1}".format(word, count))
        i = i + 1
        txt = open("f01.txt", "w", encoding='UTF-8')
txt.write(str(items))
print("文件写入成功")

运行结果

posted @ 2022-03-04 16:21 郑在亮阅读(42) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

郑在亮

安装Spark与Python练习

公告