一个完整的大作业

1.选一个自己感兴趣的主题。

2.网络上爬取相关的数据。

3.进行文本分析,生成词云。

4.对文本分析结果解释说明。

5.写一篇完整的博客,附上源代码、数据爬取及分析结果,形成一个可展示的成果。

选取的网站是“http://www.4399.com/flash/”

打开网页源代码找到相应的类跟需要的参数、

爬取数据

import requests
from bs4 import BeautifulSoup
 
def get(url):
    res = requests.get(url)
    res.encoding='gb2312'
    soup = BeautifulSoup(res.text,'html.parser')
     
    zx=soup.select('.n-game')[0]
    for games in zx:
        try:
            title=games.select('a')[0].text
            href = games.select('a')[0]['href']
            type=games.select('em')[0].text
            print(title,type)
        except:
            pass
 
gameurl = 'http://www.4399.com/flash/'
print(get(gameurl))

  

分析生成词云

from os import path 
from scipy.misc import imread   
import jieba 
import sys 
import matplotlib.pyplot as plt 
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator   
text = open('D:\\zx.txt').read() 
wordlist = jieba.cut(text) 
wl_space_split = " ".join(wordlist) 
d = path.dirname(__file__) 
nana_coloring = imread(path.join(d, "D:\\04.jpg")) 
my_wordcloud = WordCloud( background_color = 'white',   
                            mask = nana_coloring,        
                            max_words = 4000,           
                            stopwords = STOPWORDS, 
                            max_font_size = 90,       
                            random_state = 20,            ) 
    
text_dict = {   'you': 2993,   'and': 6625,   'in': 2767,   'was': 2525,   'the': 7845,}
my_wordcloud = WordCloud().generate_from_frequencies(text_dict)
 
image_colors = ImageColorGenerator(nana_coloring) 
my_wordcloud.recolor(color_func=image_colors) 
plt.imshow(my_wordcloud)   
plt.axis("off")            
plt.show()    
my_wordcloud.to_file(path.join(d, "cloudimg.png")) 

由词云看出最近万圣节的主题推广很大,还有类型上益智类的更受欢迎

 

posted @ 2017-11-02 16:29  Fatmanwu  阅读(160)  评论(0编辑  收藏  举报