python-selenium模块简单使用

Selenium-Webdriver

Intro:

　　Selenium是元素se(硒)。这个Selenium（以下简称Se）最开始只是一个自动化测试的项目，然后逐渐独立出来，并演化出了很多部分：Selenium IDE、Selenium Client API、Selenium Webdriver、Selenium Remote Control、Selenium Grid. Se对firefox支持较为友好！webdriver对firefox做了原生支持。并且Selenium IDE是firefox上的一个插件。可惜，Firefox更新了Quantum(57)，大改了内核。目前Selenium IDE还是处在不适合当前版本的状态。

具体详见：https://www.cnblogs.com/yogayan/p/6710119.html

一、安装

　　这里安装的就是selenium Client API

　　python: pip install selenium

　　配合浏览器驱动：

　　chromedriver 欢迎大家FQ下载：https://sites.google.com/a/chromium.org/chromedriver/downloads。然后找个环境路径存一下就可以调用了。

　　IEdriver 　　 github下载：https://github.com/SeleniumHQ/selenium/wiki/InternetExplorerDriver 。同样环境路径保存。

　　　　　　　　注意：IE玩家把internet选项-安全- 四个选项的启动保护模式都关掉！

PhantomJS 直接解压，把bin目录放在环境路径中

　　Opera以及geckodriver试了一下，对最新的opera和firefox都不怎么好使。当然可以下载较低的版本使用。（所以以下所有测试都选择了IE9）

二、基本使用

2.1 启动浏览器

from selenium import webdriver
driver=webdriver.Ie()   #或者webdriver.Chrome()
driver.get('https://www.baidu.com')  
driver.page_source    #拿到所有html内容，注意这里可能显示不全，因为网速跟不上程序嘛
driver.close()    #关闭浏览器

2.2 常用接口

from selenium import webdriver
from selenium.webdriver import ActionChains  #这个是模仿鼠标动作的
from selenium.webdriver.common.by import By #这个是设置查找方式的By.ID,By.CSS_SELECTOR
from selenium.webdriver.common.keys import Keys #这个是模拟键盘按键操作的
from selenium.webdriver.support import expected_conditions  #这个是标注状态的
from selenium.webdriver.support.wait import WebDriverWait #这个是等待页面加载某些元素

2.3 选择器

     1、find_element_by_id        按照id 查找
     2、find_element_by_link_text　　按照里面的文本查找，比如查找<h1>好呀</h1>find_element_by_link_text("好呀")
     3、find_element_by_partial_link_text   按照文本的部分模糊查找，比如查找<h1>好呀</h1>find_element_by_link_text("好")
     4、find_element_by_tag_name　　　　按照标签名
     5、find_element_by_class_name　　　　按照类名
     6、find_element_by_name　　　　　　　　按照name属性查找
     7、find_element_by_css_selector　　　　css选择器的方式查找
     8、find_element_by_xpath/find_elements_by_xpath　　　　　　　比较神奇的查找方式
　　　9、所有方式均可以用find_element(By.ID,"lala")这种形式替代
　　　p.s. 一些方法取到的是元素集合，用索引或者for循环取单独的值。

#顾名思义find_element_by_xpath 就是找一个元素，elements就是找很多元素，返回集合。以下不再讨论，只讨论elements的情况。

########## 关键符号：//   与 / ##################
# / 代表从第一层找   //表示在子子孙孙中找。如果//放在开始就是在整个文档中找
find_element_by_xpath("/html")  #如果是一个斜杠就只能写html标签了。
find_elements_by_xpath("//h1")  #找到所有的h1标签
find_elements_by_xpaht("//div//h1/a")   #在整个文档中找div，然后找h1标签，然后在子标签中找a标签


###########索引#############################
find_elements_by_xpath("//a[1]") #取第一个a标签


##########按照属性查找#########################
find_elements_by_xpath("//a[@href="image5.html"])  #href是image5.html的a标签
find_elements_by_xpath(''//a[contains(@href,"image5")]'')  #模糊查找
find_element_by_xpath("//*[@name='continue'][@type='button']") #查看属性name为continue且属性type为button的所有标签
find_element_by_xpath('//a[img/@src="image3_thumb.jpg"]')  #找到子标签img的src属性为image3_thumb.jpg的a标签

xpath那点事

2.4 属性获取

标签属性: tag.get_attribute('class')

其他属性: tag.location >>> 拿到x,y坐标位置

　　　　 tag.size 　　>>> 拿到元素大小（px）

　　　　 tag.tagname >>> 拿到元素名

　　　　 tag.id　　　 >>> 元素id

三、等待和交互

3.1 等待

　　1. implicit wait : 在browser.get（'xxx'）前就设置，针对所有元素有效

browser=webdriver.Ie()
browser.implicitly_wait(10)  #10为timeout时间，超过就不等

　　 2. explicit wait: 在browser.get（'xxx'）之后设置，只针对某个元素有效

#显式等待：显式地等待某个ID为content元素被加载
wait=WebDriverWait(browser,10)  #broswer为浏览器对象，10位timeout时间
wait.until(expected_conditions.presence_of_element_located((By.ID,'content')))   #终于用到expected_conditions这个对象啦

3.2 input操作

input_tag=browser.find_element_by_id('kw')
input_tag.clear() #清空输入框
input_tag.send_keys('百度')
input_tag.send_keys(Keys.ENTER) #输入回车

3.3 鼠标操作

#click
tag.click() 


#自动拖拽
actions=ActionChains(browser)   #一个动作对象
actions.drag_and_drop(tag_from,tag_to)  #tag_from是被拖拽对象，tag_to是目标所在元素
actions.perform()  #action必须有这个perform()。

#可以连着ActionChains(driver).drag_and_drop(tag_from,tag_to).actions.perform()


#more humanlike !
ActionChains(browser).click_and_hold(sourse).perform()  #点击然后hold住
ActionChains(browser).move_by_offset(xoffset=2,yoffset=0).perform() #移动
ActionChains(browser).release().perform() #松手

3.4 霸王硬上弓(JS)

browser.execute_script('alert("hello world")')  #尝试在python里写js代码

3.5 iframe

存在iframe的html中，在父frame里是无法直接查看到子frame的元素的。所以需要切换。
利用browser.switch_to.frame('iframeResult') 切换到id为iframeResult的frame。调皮的孩子就问了，如果iframe没有id怎么办？
利用switch_to.parent_frame()再切回来

四、其他

4.1 浏览器选项卡操作

　　browser.window_handles 就能获取所有选项卡对象
　　browser.switch_to_window(browser.window_handles[1]) 切换选项卡
　　至于打开一个新的选项卡：可以交给js window.open

4.2 异常处理

　　from selenium.common.exceptions import TimeoutException,NoSuchElementException,NoSuchFrameException

4.3 前进后退

　　browser.forward() 前进

　　browser.back() 后退

4.4 cookies

　　browser.get_cookies()　　

　　browser.set_cookie({"key":"value","key2":"value2"}) #设置cookies（不会人工智能的话，只能靠这个了翻过验证码了）

posted on 2018-01-11 20:07 檐夏阅读(7350) 评论(0) 收藏举报

刷新页面返回顶部

导航

公告