文件方式实现完整的英文词频统计实例

可以下载一长篇的英文小说,进行词频的分析。

1.读入待分析的字符串

2.分解提取单词 

3.计数字典

4.排除语法型词汇

5.排序

6.输出TOP(20)

s=open('book.txt','w')
s.write('''New year is the great moment for people, and many families choose to
go to the cinema and enjoy the hour. But recently, the news reported an unhappy
incident that a woman was talking loudly while watching movie and an audience
beat her for anger. The public criticized the woman’s impolite behavior,
though the audience was rude.
The impolite behavior in the cinema happens all the time. When watching the
movie, I really hate people talk, or the kids share opinions with adults.
They are disturbing the audience. Some people don’t talk, but they play smart
phone, showing a light in the dark, it is very uncomfortable. Everybody goes
to the movie to take relax, the one who doesn’t control their behavior will
disturb others.
It is everybody’s duty to self-behave. Parents need to educate their children,
or set the good example to them. Foreigners always complain about the rude
behavior on Chinese people. We have to admit our rude act, only in this way
can we get improved.
''')
s.close() 


print('读取book.txt文件,并将其转化为列表形式提取单词')
b=open('book.txt','r')
read=b.read()
b.close()
read=read.lower()
for i in ',.!?:':
    read=read.replace(i,' ')
words=read.split(' ')#提取单词
print(words)


print('集合转为字典排除语法型词汇并计数字典:')
exp={'','and','the','to'}
keys=set(words)-exp  #键的集合,排除词法型词汇
print(keys)


print('排序:')
dic={}
for w in keys:
    dic[w]=words.count(w)#单词计数字典
wc=list(dic.items())  #单词计数元组的列表
wc.sort(key=lambda x:x[1],reverse=True)#列表排序
print(wc)

print('输出TOP(20):')
for i in range(20):
    print(wc[i])

 

7.对输出结果的简要说明。

 这篇英语文章讲述了要文明观看电影

posted on 2017-09-27 20:02  104鲍珊珊  阅读(196)  评论(0)    收藏  举报

导航