第三课丶词频调用

自己的基本信息

学号：2017*****1024

姓名：王劲松

码云仓库地址：https://gitee.com/Danieljs/word_frequency

程序分步

①打开并读取文件到缓冲区

def process_file(dst):

try:

f = open(dst)

except IOError as s:

print (s)

return None

try:

bvffer = f.read()

except:

print ("Read File Error!")

return None

f.close()

return bvffer

②添加处理缓冲区bvffer代码，统计每个单词的频率，对文本特殊符号进行修改，并读入字典word_freq

def process_buffer(bvffer):

if bvffer:

word_freq = {}

for item in bvffer.strip().split():

word = item.strip(punctuation+' ')

if word in word_freq:

word_freq[word] += 1

else:

word_freq[word] = 1

return word_freq

③设置输出函数，进行排序并输出Top 10 的单词，统计词频

def output_result(word_freq):

if word_freq:

sorted_word_freq = sorted(word_freq.items(), key=lambda v: v[1], reverse=True)

for item in sorted_word_freq[:10]:

print(item)

④调用main函数，输出至控制台

if __name__ == "__main__":

import argparse

parser = argparse.ArgumentParser()

parser.add_argument('dst')

args = parser.parse_args()

dst = args.dst

bvffer = process_file(dst)

word_freq = process_buffer(bvffer)

output_result(word_freq)

程序运行命令、运行结果截图

用命令python -m cProfile word_freq.py Gone_with_the_wind.txt运行：

执行次数最多的代码：349/次

执行时间最长的代码:0.001s

改进优化的方法以及你的改进代码

应改把执行时间最长的代码优化一下，函数process_buffer函数中有一行代码：

if word in word_freq.keys():

代码在for循环中，有多少单词，这个循环就会执行多少遍，每次进行条件判断的时候都要执行一次字典的keys方法，所以耗时很多。于是把keys去除，该行代码变为：

if word in word_freq:

改进后

posted @ 2019-04-03 14:15 Danielss 阅读(174) 评论(1) 收藏举报

刷新页面返回顶部

Danielss

第三课丶词频调用

公告