文件方式实现完整的英文词频统计实例

  1. 读入待分析的字符串
  2. 分解提取单词 
  3. 计数字典
  4. 排除语法型词汇
  5. 排序
  6. 输出TOP(20)

 

fo=open('test.txt','w')
>>> fo.write('''Twinkle Twinkle Little Star
  (Declan's Prayer) - Declan Galbraith

  Twinkle twinkle little star,
  How I wonder what you are,
  Up above the world so high,
  Like a diamond in the sky,
  Star light,
  Star bright,
  The first star I see tonight,
  I wish I may, I wish I might,
  Have the wish I wish tonight,

  Twinkle twinkle little star,
  How I wonder what you are,
  I have so many wishes to make,
  But most of all is what I state,
  So just wonder,
  That I've been dreaming of,
  I wish that I can have owe her enough,
  I wish I may, I wish I might,
  Have the dream I dream tonight,

  Ooo baby

  Twinkle twinkle little star,
  How I wonder what you are,
  I want a girl who'll be all mine,
  And wants to say that I'm her guy,
  Someone's sweet that's for sure,
  I want to be the one shes looking for,
  I wish I may, I wish I might,
  Have the girl I wish tonight,

  Ooo baby

  Twinkle twinkle little star,
  How I wonder what you are,
  Up above the world so high,
  Like a diamond in the sky,
  Star light,
  Star bright,
  The first star I see tonight,
  I wish I may, I wish I might,
  Have the wish I wish tonight.''')
1138
>>> fo.close()
>>> fr=open('test.txt','r')
>>> fr.read()
fo=open('test.txt','r')
song=fo.read()
exc={'the','in','to','a','of','and','on','what','that'}
song=song.lower()
for i in '''.,-\n\t\u3000'()"''':
    song=song.replace(i,'')
words=song.split(' ')
dic={}
keys=set(words)
keys=keys-exc
for w in keys:
    dic[w]=words.count(w)

wc = list(dic.items())
wc.sort(key=lambda x:x[1],reverse=True)
print(wc)
for w in range(20):
    print(wc[w])

 

 

posted on 2017-09-26 09:50  025林婷婷  阅读(213)  评论(0编辑  收藏  举报