慢慢买技术攻坚

一.爬取网站慢慢买

1.使用selenium驱动浏览器访问网页

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

2.创建drive对象(设置浏览器可见,方便观察)

chrome_options = Options()
# chrome_options.add_argument('--headless')
# chrome_options.add_argument('--disable-gpu')
path =r'C:\Users\ChenBenYuan\AppData\Local\Google\Chrome\Application\chrome.exe'
chrome_options.binary_location = path

driver = webdriver.Chrome(chrome_options=chrome_options)

3.窗口最大化

driver.maximize_window()

4.打开网页,找到图片按钮

driver.get("http://s.manmanbuy.com/Default.aspx?key=%CD%B7%E6%DF&btnSearch=%CB%D1%CB%F7")
button = driver.find_element_by_xpath("//div[@class='bjlineSmall singlebj bj_2497475047']/div[@class='cost']/div[@class='p AreaPrice']/span[@class='poptrend']/a").get_attribute("href")
# print(button)
driver.get(button)
Action = ActionChains(driver)

5.网页弹出新页面需要用户动态点击完成验证

button2 = driver.find_element_by_xpath("/html/body/div/div/div/div[position()=1]")
# button2.click()
Action.move_to_element(button2).click().perform()
time.sleep(5)

这里需要特别注意,无论如何点击,均会出现验证失败

原因是网页的反爬机制,检测出是机器爬虫,所以这里增加反监听设置

driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
  "source": """
    Object.defineProperty(navigator, 'webdriver', {
      get: () => undefined
    })
  """
})

果然出现了绿色的验证成功,但是以为要成功的时候,没想到验证成功后,又出现了滑动验证码的检测(时有时无),为了实现自动化爬虫,自然要进行处理(这里使用了Action的动作链)因为时有时无所以嵌入try中

try:
  button3 = driver.find_element_by_xpath("/html/body/div/div/div/div[position()=3]/div[position()=1]/div/div[position()=1]/span")
  Action.move_to_element(button3).click_and_hold().perform()
  Action.move_by_offset(258,0)
  # 第三步:释放鼠标
  Action.release()
  # 执行动作
  Action.perform()
except Exception :  pass

6.将网页下拉露出图片

bottom = 'document.documentElement.scrollTop=100000'
driver.execute_script(bottom)

7.出现这样的图片

如何爬取该图片数据成为一个问题,F12检查,发现数据是动态的,直接爬取没有任何结果,数据是根据光标在图上的指定位置而返回不同的数据,若光标不在图上则没有任何数据,所以这里设置了光标游动

# 确定画布位置
canvas = driver.find_element_by_xpath("/html/body/div[position()=2]/div/div/div[position()=1]/div[position()=1]/div/div/div[position()=2]/div/div[position()=1]")
kw_x = canvas.location.get('x')
print(kw_x)
#光标游动,爬取数据,写入文件
for i in range (-540,580,36):
  print(i)
  Action.move_to_element(canvas).perform()
  Action.move_by_offset(i,0).perform()
  time.sleep(1)
  text = driver.find_element_by_xpath(
    "/html/body/div[position()=2]/div/div/div[position()=1]/div[position()=1]/div/div/div[position()=2]/div/div[position()=2]/div").text
  # data = text.replace(" ",",")
  try:
    with open('price.txt','a',encoding='utf-8') as f:
      f.write(text+'\n')
  except Exception as err:
    print('write error')

具体模拟过程见gif

posted @ 2021-12-21 22:33  Aplical  阅读(50)  评论(0编辑  收藏  举报