网络爬虫爬取微博热搜榜标题

安装相应所需的第三方库,在网页上找到微博热搜榜,用F12找到标题的位置:td-02

最终爬取结果:

 

 

源代码:

import requests
from bs4 import BeautifulSoup
import bs4
url = "https://s.weibo.com/top/summary?cate=realtimehot"
def getHTMLText(url):
try:
kv={"User-Agent":"Mozilla/5.0"}
r = requests.get(url, headers=kv, timeout=30)
r.raise_for_status()
r.encoding=r.apparent_encoding
return r.text
except:
return "error"
html = getHTMLText(url)
soup=BeautifulSoup(html,'html.parser')
sou = soup.find_all("td",class_='td-02')
name = []
for x in sou:
print(x.a.string)

在最后附上漏了的第二次作业: 代码:

import jieba
import wordcloud
import imageio

m=imageio.imread('王冠.jfif')
#w=wordcloud.WordCloud(mask=m)
w=wordcloud.WordCloud(width=800,
height=600,
background_color='white',
font_path='simsun.ttc',
mask=m)
f=open('一念永恒.txt',encoding='utf-8')
txt=f.read()
txt1=jieba.lcut(txt)
s=" ".join(txt1)
w.generate(s)
w.to_file('白.jpg')

posted @ 2020-05-03 18:42  impotent  阅读(844)  评论(0)    收藏  举报