一个完整的大作业



1.选取一个自己感兴趣的主题,我选取了搜狐新闻


 

 

网站:http://news.sohu.com/

 

 

2.网络上爬取相关的数据,并输出结果

import requests
from bs4 import BeautifulSoup

url = 'http://news.sohu.com/'
res = requests.get(url)
res.encoding = 'UTF-8'

soup = BeautifulSoup(res.text, 'html.parser')

for news in soup.select('.list16'):
    li = news.select('li')  
    if len(li) > 0:      
        title = li[0].text       
        href = li[0].select('a')[0]['href']
        print(title, href)

 

 

 

 3.进行文本分析,生成词云。

import jieba
from wordcloud import WordCloud
import matplotlib.pyplot as plt
 
text =open("D:\\cc.txt",'r',encoding='utf-8').read()
print(text)
wordlist = jieba.cut(text,cut_all=True)
wl_split = "/".join(wordlist)
 
mywc = WordCloud().generate(text)
plt.imshow(mywc)
plt.axis("off")
plt.show()

4.结果

 

posted on 2017-11-02 16:32  27--何卓霖  阅读(170)  评论(0编辑  收藏  举报