selenium+urllib.request爬取动态网页的图片

1、一般的爬虫只能够爬取静态网页的数据，动态网页的数据需要使用selenium来制作爬虫来爬取，selenium在电脑内部启动了一个浏览器去访问网页，用selenium制作的爬虫行为和人十分相似，因而能够访问到一般爬虫访问不到的数据。

2、使用selenium爬虫需要具备以下条件：a、已经安装好selenium模块 b、已经下载好浏览器的驱动程序，并将它放在python的scripts目录下（我这里使用的是Chrome浏览器）

3、代码粘贴如下，

import urllib.request
import os,sys
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
os.chdir(r"C:\Users\MyPC\Desktop\图片文件夹")#将文件工作目录转到要保存图片的文件夹下
#reaval=os.getcwd()
#print(reaval)
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
driver= webdriver.Chrome(options=chrome_options)
driver.get("https://pic.sogou.com/pics?query=%C3%C3%D7%D3%CD%BC&p=40230500&st=255&mode=255&policyType=0")#爬虫访问的网页
#html=driver.page_source
#print(type(html))
tags=driver.find_elements_by_xpath('//*[@id="imgid"]//img')
def download():
    i=1
    for tag in tags:
        i=str(i)
        path = r"C:\Users\MyPC\Desktop\图片文件夹"
        if not os.path.exists(path):
            os.mkdir(path)
        else:
            f = open("picture"+"{}".format(i)+".jpg", "ab")
            print("开始下载第{}张图片......\n下载的链接为\n".format(i),tag.get_attribute("src"))
            url=tag.get_attribute("src")
#            req=urllib.request.Request(url,headers=headers)
            data=urllib.request.urlopen(url)
            data=data.read()
            f.write(data)
            f.close()
            print("第{}张图片下载已完成！".format(i))
            i=int(i)
            i+=1
    driver.close()
print("开始下载！")
download()
print("下载完成！")

posted @ 2020-02-13 14:56 Daze_Lu 阅读(577) 评论(0) 收藏举报

刷新页面返回顶部

selenium+urllib.request爬取动态网页的图片

公告