python3 网络爬虫开发实战2 笔记二（App爬虫）

1、夜神模拟器：https://www.yeshen.com/

2、下载Fildder：https://www.telerik.com/download/fiddler

相关设置可以参考：https://www.cnblogs.com/wuxuanlin/p/16070738.html

2、采用requests来得到网页的HTML

urlYs = ("https://www.xxx.com/")

headers ={
    "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.5845.97 Safari/537.36 Core/1.116.623.400 QQBrowser/20.1.7293.400"
}
# 从url中得到网页的HTML
HTML_request = requests.get(headers=headers,url=urlYs)

3、安装bs4 使用 Beatifulsoup

pip install bs4

4、采用Beatifulsoup的find或find_all得到HTML中相关的标签

# 解析数据
# 从HTML_request中得到HTML标准网页
page = bs(HTML_request.text,"html.parser")
# print(page)
# find,find_all,从爬出的数据中（HTML)中按标签来找
dataUl = page.find("ul",class_="html5zoo-slides")
# 从dataUl中再找img标签
dataLis =dataUl.find_all("img")
# 从img标签中找到src属性，并得到src的链接

5、从得的标签中.get得到相关的属性值。

# 从img标签中找到src属性，并得到src的链接
for dataLi in dataLis:
    src =dataLi.get("src")
    # 从链接中得到图片的二进制文件
    img = requests.get(src).content
    # 从链接中得到文件名，倒数第一个“\”
    img_name = src.split("/")[-1]
    # 保存在文件中
    with open("../img/"+img_name,"wb") as file:
        file.write(img)
    # 记得关闭文件
    file.close()
# 记得request
HTML_request.close()

6、保存在文件中，记得关闭文件及request

posted on 2026-03-14 23:47 深圳男生快快乐乐阅读(1) 评论(0) 收藏举报

刷新页面返回顶部

深圳男生快快乐乐

python3 网络爬虫开发实战2 笔记二（App爬虫）

导航

公告