Python——BeutifulSoup

简介

　　BeutifulSoup：是用来解析html文档的模块

导入

　　Import bs4.builder

方法

　bs4.BeautifulSoup(page_content,"html.parser")

　　参数：

　　　　page_content：页面内容

　　　　html.parser：页面解析方式

　　返回值：html

　　返回值作用：获取html标签

　　作用：获取html根标签

# bs的四个功能

# 1. html.find(markName,attrs=attrs)
#    参数： 标签名，属性字典
#    返回值：mark
#    返回值作用：获取mark标签
#    作用：获取attrs属性标记的mark标签的第一个

# 2. html.find_all(markName,attrs=attrs)
#    参数：标签名，属性字典
#    返回值：list
#    返回值作用：获取所以匹配的marklist
#    作用；获取所有匹配的marklist

# 3.  mark[属性名] 获取mark标签的属性值
# 4.  mark.text   获取mark标签的内容
　5.  mark.content 嵌套内容

思路

bs的思路： 获取所有列表数据的父节点。通过父节点find_all() 获取子节点列表。 再通过子节点获取text或[属性]

使用

# 爬取网站：https://zhwsxx.com
url = 'https://zhwsxx.com/'
headers = {
    "user-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36"
}
params = {

}
response = requests.get(url=url,headers=headers,params=params)
page_content = response.text
html = bs4.BeautifulSoup(page_content,'html.parser')
div = html.find('div',attrs={'class':'box-8n3-pane js_box_pane active'})

imgUrl_list = div.find_all('p',attrs={'class':'mh-cover'})
cName_list = div.find_all('h2',attrs={'class':'title'})
cDetail_list = div.find_all('p',attrs={'class':'chapter'})

for i in range(len(imgUrl_list)):
    print(imgUrl_list[i]['style'])
    print(cName_list[i].text.strip())
    print(cDetail_list[i].text.strip())

posted @ 2021-10-11 19:06 remix_alone 阅读(234) 评论(0) 收藏举报

刷新页面返回顶部

remix_alone

Python——BeutifulSoup

公告