文件方式实现完整的英文词频统计实例
1.读入待分析的字符串
fo=open('text.txt','w') fo.write('''Waking up I see that everything is okay The first time in my life and now it's so great Slowing down I look around and I am so amazed I think about the little things that make life great I wouldn't change a thing about it This is the best feeling This innocence is brilliant, I hope that it will stay This moment is perfect, please don't go away I need you now And I'll hold on to it, don't you let it pass you by I found a place so safe, not a single tear The first time in my life and now it's so clear Feel calm, I belong, I'm so happy here It's so strong and now I let myself be sincere I wouldn't change a thing about it This is the best feeling This innocence is brilliant, I hope that it will stay This moment is perfect, please don't go away I need you now''' ) fo=open('text.txt','r') day=fo.read()
运行结果为:

2.分解提取单词
for i in ',.\"?':
day=day.replace(i,' ')
words=day.split(' ')
print(words)
运行结果为:

3.计数字典
dict={} keys=set(words) print(keys) for i in keys: dict[i]=words.count(i) print(dict)
运行结果为:

4.排除语法型词汇
exc={'i','sincere','to','brilliant','the','innocence','of','so','and','were','','on','really'}
dict={}
keys=set(words)
keys=keys-exc
print(keys)
for i in keys:
dict[i]=words.count(i)
print(dict)
运行结果为:

5.排序
wc=list(dict.items()) wc.sort(key=lambda x:x[1],reverse=True) print(wc)
运行结果为:

6.输出TOP(20)
for i in range(20): print(wc[i])
运行结果为:


浙公网安备 33010602011771号