• 博客园logo
  • 会员
  • 众包
  • 新闻
  • 博问
  • 闪存
  • 赞助商
  • HarmonyOS
  • Chat2DB
    • 搜索
      所有博客
    • 搜索
      当前博客
  • 写随笔 我的博客 短消息 简洁模式
    用户头像
    我的博客 我的园子 账号设置 会员中心 简洁模式 ... 退出登录
    注册 登录
oooooolr
You can do anything but not everything. ——David Allen
博客园    首页    新随笔    联系   管理    订阅  订阅

chromedriver 全屏 翻页 错误

from selenium import webdriver
from selenium.common.exceptions import TimeoutException, StaleElementReferenceException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from pyquery import PyQuery as pq
browser=webdriver.Chrome()

def search():
    try:
        browser.get('https://www.jd.com/')
        input=WebDriverWait(browser,10).until(
            EC.presence_of_element_located((By.CSS_SELECTOR,'#key'))
        )
        submit=WebDriverWait(browser,10).until(
            EC.element_to_be_clickable((By.CSS_SELECTOR,'#search > div > div.form > button > i'))
        )
        input.send_keys("佩奇")
        submit.click()
        total_pages=WebDriverWait(browser,10).until(
            EC.presence_of_element_located((By.CSS_SELECTOR,'#J_bottomPage > span.p-skip > em:nth-child(1) > b'))
        )
        get_product_media()
        pages=int(total_pages.text)
        return pages
    except TimeoutException:
        search()

def search_page(number):
    try:
        input = WebDriverWait(browser, 20).until(
            EC.presence_of_element_located((By.CSS_SELECTOR, '#J_bottomPage > span.p-skip > input'))
        )
        submit = WebDriverWait(browser, 20).until(
            EC.element_to_be_clickable((By.CSS_SELECTOR, '#J_bottomPage > span.p-skip > a'))
        )
        input.clear()
        input.send_keys(number)
        submit.click()
        get_product_media()
        # WebDriverWait(browser, 10).until(
        #     EC.text_to_be_present_in_element((By.CSS_SELECTOR,'#J_bottomPage > span.p-num > a.curr'),str(number))
        # )
    except StaleElementReferenceException:
        search_page(number)

def get_product_media():
    # try:
    WebDriverWait(browser, 10).until(
                    EC.presence_of_element_located((By.CSS_SELECTOR,'#J_goodsList .gl-item .p-img'))
                )
    html=browser.page_source
    doc=pq(html)
    items=doc('#J_goodsList .gl-i-wrap ').items()
    for item in items:
        product={
            'image': item.find('.p-img').attr('src'),
            # 'price': item.find('.p-price').text()
            # 'image': item.find('.p-img a img').attr('data-lazy-img')
        }
        print(product)
        # print(item)






def main():
    pages=search()
    print(type(pages))
    for i in range(2,pages+1):
        search_page(i)





if __name__ == '__main__':
    main()

运行的时候如果弹出的chrome不是全屏模式,翻页会不能运行。。。

另:一直无法解析到正确的src,直到看了https://www.cnblogs.com/airnew/p/10101698.html,发现把html = browser.page_source.replace('xmlns', 'another_attr'),后就可以正确解析了replace('xmlns', 'another_attr')这是什么意思,原作者说要把xmls替换,试了下替换成‘an’也会工作,

CHOOSING A SPECIFIC MATCH

CSS selectors in Selenium allow us to navigate lists with more finess that the above methods. If we have a ul and we want to select its fourth li element without regard to any other elements, we should use nth-child or nth-of-type.

<ul id = "recordlist">

<li>Cat</li>

<li>Dog</li>

<li>Car</li>

<li>Goat</li>

</ul>

If we want to select the fourth li element (Goat) in this list, we can use the nth-of-type, which will find the fourth li in the list.

CSS: #recordlist li:nth-of-type(4)

On the other hand, if we want to get the fourth element only if it is a li element, we can use a filtered nth-child which will select (Car) in this case.

CSS: #recordlist li:nth-child(4)

Note, if you don’t specify a child type for nth-child it will allow you to select the fourth child without regard to type. This may be useful in testing css layout in selenium.

CSS: #recordlist *:nth-child(4)
from :https://saucelabs.com/resources/articles/selenium-tips-css-selectors
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- focus on what you want to be
posted @ 2019-01-29 14:51  oooooolr  阅读(437)  评论(0)    收藏  举报
刷新页面返回顶部
博客园  ©  2004-2025
浙公网安备 33010602011771号 浙ICP备2021040463号-3