英文词频统计
- 词频统计预处理
- 下载一首英文的歌词或文章
- 将所有,.?!’:等分隔符全部替换为空格
- 将所有大写转换为小写
- 生成单词列表
- 生成词频统计
- 排序
- 排除语法型词汇,代词、冠词、连词
- 输出词频最大TOP10
f = open('whr.txt','r') music = f.read() # f.close() # 将所有大写转换为小写# music = music.lower() print('全部转换为小写的结果:' + music + '\n') # 将所有分隔符(,.?!)替换为空格 p = 0 symbol = list(''',.?!’:"“”-%$''') for p in symbol: music = music.replace(p, ' ') print('分隔符替换为空格的结果:' + music + '\n') split = music.split() word = {} for i in split: count = music.count(i) word[i] = count words = ''' a an the in on to at and of is was are were i he she you your they us their our it or for be too do no that s so as but it's don't ''' prep = words.split() for i in prep: if i in word.keys(): del (word[i]) word = sorted(word.items(), key=lambda item: item[1], reverse=True) for i in range(10): print(word[i])