综合练习:词频统计

1.英文词频统

下载一首英文的歌词或文章

news='''
    See I never thought that I could walk through fire I never thought that I could take the burn I never had the strength to take it higher Until I reached the point of no return And there's just no turning back When your hearts under attack Gonna give everything I have It's my destiny I will never say never! (I will fight) I will fight till forever! (make it right) Whenever you knock me down I will not stay on the ground Pick it up Pick it up Pick it up Pick it up up up And never say never I never thought I could feel this power I never thought that I could feel this free I'm strong enough to climb the highest tower And I'm fast enough to run across the sea And there's just no turning back When your hearts under attack Gonna give everything I have Cause this is my destiny I will never say never!(I will fight) I will fight till forever!(make it right) Whenever you knock me down I will not stay on the ground Pick it up Pick it up Pick it up Pick it up, up, up And never say never Here we go! Guess who? JSmith and Jb! I gotcha lil bro I can handle him Hold up, aight? I can handle him Now he's bigger than me Taller than me And he's older than me And stronger than me And his arms a little bit longer than me But he ain't on a JB song with me! I be trying a chill They be trying to side with the thrill No pun intended, was raised by the power of Will Like Luke with the force, when push comes to shove Like Cobe with the 4th, ice water with blood I gotta be the best, and yes We're the flyest Like David and Goliath I conquered the giant So now I got the world in my hand I was born from two stars So the moon's where I land I will never say never!(I will fight) I will fight till forever!(make it right) Whenever you knock me down I will not stay on the ground Pick it up Pick it up Pick it up Pick it up, up, up And never say never I will never say never!(I will fight) I will fight till forever!(make it right) Whenever you knock me down I will not stay on the ground Pick it up Pick it up Pick it up Pick it up, up, up And never say never Never say never Never say never Never say never Never say never Never say never Never say never Never say never And never say never 
'''

  

将所有,.?!’:等分隔符全部替换为空格

1 replace_syntax = '''!.?'";:,'''
2 
3 for charater in replace_syntax:
4     news = news.replace(charater," ")

 

将所有大写转换为小写

news = news.lower().split()

 

生成单词列表

wordList = list(news)

 

生成词频统计

wordSet = set(wordList)

for word in wordSet:
    wordDict[word] = wordList.count(word)

 

排序

1 dictList = list(wordDict.items())
2 dictList.sort(key=lambda x:x[1], reverse=True)

 

排除语法型词汇,代词、冠词、连词

1 exclude = {'the','of','a','in','to','his','is','be','that','t'}
2 for delet_word in exclude:
3     del(wordDict[delet_word])

 

输出词频最大TOP20

1 for i in range(20):
2     print(dictList[i])

将分析对象存为utf-8编码的文件,通过文件读取的方式获得词频分析内容。

1 file = open('news.txt','r')
2 news = file.read()
3 file.close()

2.中文词频统计

下载一长篇中文文章。

从文件读取待分析文本。

1 # -*- coding: UTF-8 -*-
2 file = open('因为寂寞,我选择了回归.txt','r',encoding='UTF-8')
3 novel = file.read()
4 file.close()
5 
6 print(novel)

news = open('gzccnews.txt','r',encoding = 'utf-8')

安装与使用jieba进行中文分词。

pip install jieba

import jieba

list(jieba.cut(news))

生成词频统计

1 import jieba
2 
3 jieba.cut(novel)
4 print(jieba.cut(novel))
5 
6 novel_list = list(jieba.cut(novel))
7 print(novel_list)

排序

排除语法型词汇,代词、冠词、连词

输出词频最大TOP20(或把结果存放到文件里

 1 novel_dict = {}
 2 novel_set = set(novel_list)
 3 for word in novel_set:
 4     novel_dict[word] = novel_list.count(word)
 5 
 6 exclude = ' '
 7 for delet_word in exclude:
 8     del(novel_dict[delet_word])
 9 
10 word_list = list(novel_dict.items())
11 word_list.sort(key=lambda x:x[1], reverse=True)
12 
13 for i in range(20):
14     print(word_list[i])

 

posted on 2018-03-28 20:46  何晓锋  阅读(194)  评论(0)    收藏  举报