Selenium基础
What is selenium?
selenium 英 /sɪ'liːnɪəm/最初是一个自动化测试工具,而爬虫中使用它主要是为了解决requests无法直接执行JavaScript代码的问题 selenium本质是通过驱动浏览器,完全模拟浏览器的操作,比如跳转、输入、点击、下拉等,来拿到网页渲染之后的结果,可支持多种浏览器
环境准备
- pip install selenium
- 下载对应浏览器的驱动程序
编码流程
- 导包:from selenium import webdriver
- 实例化某一款浏览器对象,需要利用浏览器的驱动程序
- 制定相关的行为动作,包含标签定位;节点交互
Selenium有什么作用?
用来完成浏览器自动化相关的操作,可以通过代码的形式指定基于浏览器自动化的相关操作(行为动作),当代码执行后,浏览器就会自动触发相关的事件
Selenium和爬虫的关联
1.使用selenium可以完成模拟登陆,
2.可以获取动态加载的数据
简单实用

from selenium import webdriver
from time import sleep
# 后面是你的浏览器驱动位置,记得前面加r'','r'是防止字符转义的
driver = webdriver.Chrome(r'驱动程序路径')
# 用get打开百度页面
driver.get("http://www.baidu.com")
# 查找页面的“设置”选项,并进行点击
driver.find_elements_by_link_text('设置')[0].click()
sleep(2)
# # 打开设置后找到“搜索设置”选项,设置为每页显示50条
driver.find_elements_by_link_text('搜索设置')[0].click()
sleep(2)
# 选中每页显示50条
m = driver.find_element_by_id('nr')
sleep(2)
m.find_element_by_xpath('//*[@id="nr"]/option[3]').click()
m.find_element_by_xpath('.//option[3]').click()
sleep(2)
# 点击保存设置
driver.find_elements_by_class_name("prefpanelgo")[0].click()
sleep(2)
# 处理弹出的警告页面 确定accept() 和 取消dismiss()
driver.switch_to_alert().accept()
sleep(2)
# 找到百度的输入框,并输入 美女
driver.find_element_by_id('kw').send_keys('美女')
sleep(2)
# 点击搜索按钮
driver.find_element_by_id('su').click()
sleep(2)
# 在打开的页面中找到“Selenium - 开源中国社区”,并打开这个页面
driver.find_elements_by_link_text('美女_百度图片')[0].click()
sleep(3)
# 关闭浏览器
driver.quit()
简单实用和对应效果

#谷歌浏览器
from selenium import webdriver
from time import sleep
bro = webdriver.Chrome(executable_path='./chromedriver.exe')
bro.get('https://www.baidu.com')
sleep(2)
#标签定位,一般通过id或者xpath定位精准
tag_input = bro.find_element_by_id('kw')
tag_input.send_keys('人民币')
sleep(2)
btn = bro.find_element_by_id('su')
btn.click()
sleep(2)
#关闭浏览器
bro.quit()
demo1
Selenium具体应用
创建浏览器
Selenium支持非常多的浏览器,如Chrome、Firefox、Edge等,还有Android、BlackBerry等手机端的浏览器。另外,也支持无界面浏览器PhantomJS。常用的还是Chrome
from selenium import webdriver
browser = webdriver.Chrome()
browser = webdriver.Firefox()
browser = webdriver.Edge()
browser = webdriver.PhantomJS()
browser = webdriver.Safari()
标签和元素定位
# webdriver 提供的常用的标签定位方法,以下比较常用
find_element_by_id()
find_element_by_name()
find_element_by_class_name()
find_element_by_tag_name()
find_element_by_link_text()
find_element_by_partial_link_text()
find_element_by_xpath()
find_element_by_css_selector()
"""
注意事项:
1、find_element_by_xxx找的是第一个符合条件的标签,find_elements_by_xxx找的是所有符合条件的标签。
2、根据ID、CSS选择器和XPath获取,它们返回的结果完全一致。
3、另外,Selenium还提供了通用方法find_element(),它需要传入两个参数:查找方式By和值。实际上,它就是find_element_by_id()这种方法的通用函数版本,比如find_element_by_id(id)就等价于find_element(By.ID, id),二者得到的结果完全一致。
"""
节点交互
Selenium可以驱动浏览器来执行一些操作,也就是说可以让浏览器模拟执行一些动作。比较常见的用法有:输入文字时用send_keys()方法,清空文字时用clear()方法,点击按钮时用click()方法。示例如下:
from selenium import webdriver
import time
browser = webdriver.Chrome()
browser.get('https://www.qiushibaike.com')
input = browser.find_element_by_id('qb-input')
input.send_keys('搞笑段子')
time.sleep(1)
input.clear()
input.send_keys('IPhone')
button = browser.find_element_by_class_name('btn-search')
button.click()
browser.quit()
执行JS代码
对于某些操作,Selenium API并没有提供。比如,下拉进度条,它可以直接模拟运行JavaScript,此时使用execute_script()方法即可实现,代码如下:
from selenium import webdriver
browser = webdriver.Chrome()
browser.get('https://www.dingdeng.com/')
browser.execute_script('window.scrollTo(0, document.body.scrollHeight)')
browser.execute_script('alert("叮当")')

#雪球网
from selenium import webdriver
from time import sleep
bro = webdriver.Chrome(executable_path='./chromedriver.exe')
bro.get('https://xueqiu.com/')
sleep(5)
#执行js实现滚轮向下滑动
js = 'window.scrollTo(0,document.body.scrollHeight)'
bro.execute_script(js)
sleep(2)
bro.execute_script(js)
sleep(2)
bro.execute_script(js)
sleep(2)
bro.execute_script(js)
sleep(2)
a_tag = bro.find_element_by_xpath('//*[@id="app"]/div[3]/div/div[1]/div[2]/div[2]/a')
a_tag.click()
sleep(5)
#获取当前浏览器页面数据(动态)*****
print(bro.page_source)
bro.quit()
demo2 示例2 雪球网滚动条下拉获取data,其中实例化的对象 bro.execute_script(js)
获取页面源码数据
获取页面源码数据 ,动态加载的数据 page_source 属性可实现
注: 当遇到iframe标签 定位写法 ---> brower.switch_to.frame(id) 处理子页面
前进和后退
#前进和后退
from selenium import webdriver
from time import sleep
bro = webdriver.Chrome(executable_path='./chromedriver.exe')
bro.get('https://www.baidu.com')
sleep(1)
bro.get('http://www.goubanjia.com/')
sleep(1)
bro.get('https://www.taobao.com')
sleep(1)
#后退
bro.back()
sleep(1)
#前进
bro.forward()
sleep(1)
print(bro.page_source)
bro.quit()
phantomJS
PhantomJS是一款无界面的浏览器,其自动化操作流程和上述操作谷歌浏览器是一致的。由于是无界面的,为了能够展示自动化操作流程,PhantomJS为用户提供了一个截屏的功能,使用save_screenshot函数实现。不过目前已弃用

from selenium import webdriver
import time
# phantomjs路径
path = r'PhantomJS驱动路径'
browser = webdriver.PhantomJS(path)
# 打开百度
url = 'http://www.baidu.com/'
browser.get(url)
time.sleep(3)
browser.save_screenshot(r'phantomjs\baidu.png')
# 查找input输入框
my_input = browser.find_element_by_id('kw')
# 往框里面写文字
my_input.send_keys('美女')
time.sleep(3)
#截屏
browser.save_screenshot(r'phantomjs\meinv.png')
# 查找搜索按钮
button = browser.find_elements_by_class_name('s_btn')[0]
button.click()
time.sleep(3)
browser.save_screenshot(r'phantomjs\show.png')
time.sleep(3)
browser.quit()
简单用法
谷歌无头浏览器
由于PhantomJ基本弃用,所以推荐大家可以使用谷歌的无头浏览器,是一款无界面的谷歌浏览器。(无界面防止爬取数据被数据吓着,哈哈)
#谷歌无头浏览器用法
from selenium import webdriver
from time import sleep
from selenium.webdriver.chrome.options import Options
# 创建一个参数对象,用来控制chrome以无界面模式打开
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
bro = webdriver.Chrome(executable_path='./chromedriver.exe',options=chrome_options)
bro.get('https://www.baidu.com')
sleep(2)
bro.save_screenshot('1.png')
#标签定位
tag_input = bro.find_element_by_id('kw')
tag_input.send_keys('人民币')
sleep(2)
btn = bro.find_element_by_id('su')
btn.click()
sleep(2)
print(bro.page_source)
bro.quit()
动作链
通过在上面的一些实例中,一些交互动作都是针对某个节点执行的。比如,对于输入框,我们就调用它的输入文字和清空文字方法;对于按钮,就调用它的点击方法。其实,还有另外一些操作,它们没有特定的执行对象,比如鼠标拖曳、键盘按键等,这些动作用另一种方式来执行,那就是动作链。
示例1 动作链 菜鸟教程自动拖拽
#菜鸟教程拖拽示例,drag ..
#动作链
from selenium import webdriver
from time import sleep
from selenium.webdriver import ChromeOptions
from selenium.webdriver import ActionChains
option = ChromeOptions()
option.add_experimental_option('excludeSwitches', ['enable-automation'])
bro = webdriver.Chrome(executable_path='./chromedriver.exe',options=option)
url = 'https://www.runoob.com/try/try.php?filename=jqueryui-api-droppable'
bro.get(url=url)
#如果定位的标签存在于iframe标签之中,则必须经过switch_to操作在进行标签定位
bro.switch_to.frame('iframeResult')
source_tag = bro.find_element_by_id('draggable')
taget_tag = bro.find_element_by_id('droppable')
#创建一个动作连的对象
action = ActionChains(bro)
# 将..拖拽到目标
action.drag_and_drop(source_tag,taget_tag)
# 执行动作
action.perform()
sleep(3)
# bro.quit()
示例5-2 菜鸟教程 手动拖拽
from selenium import webdriver
from time import sleep
from selenium.webdriver import ActionChains
bro = webdriver.Chrome(executable_path='./chromedriver.exe')
url = 'https://www.runoob.com/try/try.php?filename=jqueryui-api-droppable'
bro.get(url=url)
#如果定位的标签存在于iframe标签之中,则必须经过switch_to操作在进行标签定位
bro.switch_to.frame('iframeResult')
source_tag = bro.find_element_by_id('draggable')
#创建一个动作连的对象
action = ActionChains(bro)
#点住并拖动
action.click_and_hold(source_tag)
for i in range(4):
#perform表示开始执行动作链
action.move_by_offset(20,0).perform()
sleep(1)
bro.quit()
Selenium---> Cookie处理
使用Selenium,还可以方便地对Cookies进行操作,例如获取、添加、删除Cookies等。示例如下:
from selenium import webdriver
browser = webdriver.Chrome()
browser.get('https://www.zhihu.com/explore')
print(browser.get_cookies())
browser.add_cookie({'name': 'name', 'domain': 'www.zhihu.com', 'value': 'germey'})
print(browser.get_cookies())
browser.delete_all_cookies()
print(browser.get_cookies())
异常处理
from selenium import webdriver
from selenium.common.exceptions import TimeoutException,NoSuchElementException,NoSuchFrameException
try:
browser=webdriver.Chrome()
browser.get('http://www.runoob.com/try/try.php?filename=jqueryui-api-droppable')
browser.switch_to.frame('iframssseResult')
except TimeoutException as e:
print(e)
except NoSuchFrameException as e:
print(e)
finally:
browser.close()
Selenium规避被检测识别
现在不少大网站有对selenium采取了监测机制。比如正常情况下我们用浏览器访问淘宝等网站的 window.navigator.webdriver的值为
undefined。而使用selenium访问则该值为true。那么如何解决这个问题呢?
只需要设置Chromedriver的启动参数即可解决问题。在启动Chromedriver之前,为Chrome开启实验性功能参数excludeSwitches,它的值为['enable-automation'],完整代码如下:

from selenium.webdriver import Chrome
from selenium.webdriver import ChromeOptions
option = ChromeOptions()
option.add_experimental_option('excludeSwitches', ['enable-automation'])
driver = Chrome(options=option)
具体代码
...
CrazyShenldon
posted @
2019-05-08 11:28
FindSoul
阅读(
532)
评论()
收藏
举报
var RENDERER = {
POINT_INTERVAL : 5,
FISH_COUNT : 3,
MAX_INTERVAL_COUNT : 50,
INIT_HEIGHT_RATE : 0.5,
THRESHOLD : 50,
init : function(){
this.setParameters();
this.reconstructMethods();
this.setup();
this.bindEvent();
this.render();
},
setParameters : function(){
this.$window = $(window);
this.$container = $('#jsi-flying-fish-container');
this.$canvas = $('');
this.context = this.$canvas.appendTo(this.$container).get(0).getContext('2d');
this.points = [];
this.fishes = [];
this.watchIds = [];
},
createSurfacePoints : function(){
var count = Math.round(this.width / this.POINT_INTERVAL);
this.pointInterval = this.width / (count - 1);
this.points.push(new SURFACE_POINT(this, 0));
for(var i = 1; i < count; i++){
var point = new SURFACE_POINT(this, i * this.pointInterval),
previous = this.points[i - 1];
point.setPreviousPoint(previous);
previous.setNextPoint(point);
this.points.push(point);
}
},
reconstructMethods : function(){
this.watchWindowSize = this.watchWindowSize.bind(this);
this.jdugeToStopResize = this.jdugeToStopResize.bind(this);
this.startEpicenter = this.startEpicenter.bind(this);
this.moveEpicenter = this.moveEpicenter.bind(this);
this.reverseVertical = this.reverseVertical.bind(this);
this.render = this.render.bind(this);
},
setup : function(){
this.points.length = 0;
this.fishes.length = 0;
this.watchIds.length = 0;
this.intervalCount = this.MAX_INTERVAL_COUNT;
this.width = this.$container.width();
this.height = this.$container.height();
this.fishCount = this.FISH_COUNT * this.width / 500 * this.height / 500;
this.$canvas.attr({width : this.width, height : this.height});
this.reverse = false;
this.fishes.push(new FISH(this));
this.createSurfacePoints();
},
watchWindowSize : function(){
this.clearTimer();
this.tmpWidth = this.$window.width();
this.tmpHeight = this.$window.height();
this.watchIds.push(setTimeout(this.jdugeToStopResize, this.WATCH_INTERVAL));
},
clearTimer : function(){
while(this.watchIds.length > 0){
clearTimeout(this.watchIds.pop());
}
},
jdugeToStopResize : function(){
var width = this.$window.width(),
height = this.$window.height(),
stopped = (width == this.tmpWidth && height == this.tmpHeight);
this.tmpWidth = width;
this.tmpHeight = height;
if(stopped){
this.setup();
}
},
bindEvent : function(){
this.$window.on('resize', this.watchWindowSize);
this.$container.on('mouseenter', this.startEpicenter);
this.$container.on('mousemove', this.moveEpicenter);
this.$container.on('click', this.reverseVertical);
},
getAxis : function(event){
var offset = this.$container.offset();
return {
x : event.clientX - offset.left + this.$window.scrollLeft(),
y : event.clientY - offset.top + this.$window.scrollTop()
};
},
startEpicenter : function(event){
this.axis = this.getAxis(event);
},
moveEpicenter : function(event){
var axis = this.getAxis(event);
if(!this.axis){
this.axis = axis;
}
this.generateEpicenter(axis.x, axis.y, axis.y - this.axis.y);
this.axis = axis;
},
generateEpicenter : function(x, y, velocity){
if(y < this.height / 2 - this.THRESHOLD || y > this.height / 2 + this.THRESHOLD){
return;
}
var index = Math.round(x / this.pointInterval);
if(index < 0 || index >= this.points.length){
return;
}
this.points[index].interfere(y, velocity);
},
reverseVertical : function(){
this.reverse = !this.reverse;
for(var i = 0, count = this.fishes.length; i < count; i++){
this.fishes[i].reverseVertical();
}
},
controlStatus : function(){
for(var i = 0, count = this.points.length; i < count; i++){
this.points[i].updateSelf();
}
for(var i = 0, count = this.points.length; i < count; i++){
this.points[i].updateNeighbors();
}
if(this.fishes.length < this.fishCount){
if(--this.intervalCount == 0){
this.intervalCount = this.MAX_INTERVAL_COUNT;
this.fishes.push(new FISH(this));
}
}
},
render : function(){
requestAnimationFrame(this.render);
this.controlStatus();
this.context.clearRect(0, 0, this.width, this.height);
this.context.fillStyle = 'hsl(0, 0%, 95%)';
for(var i = 0, count = this.fishes.length; i < count; i++){
this.fishes[i].render(this.context);
}
this.context.save();
this.context.globalCompositeOperation = 'xor';
this.context.beginPath();
this.context.moveTo(0, this.reverse ? 0 : this.height);
for(var i = 0, count = this.points.length; i < count; i++){
this.points[i].render(this.context);
}
this.context.lineTo(this.width, this.reverse ? 0 : this.height);
this.context.closePath();
this.context.fill();
this.context.restore();
}
};
var SURFACE_POINT = function(renderer, x){
this.renderer = renderer;
this.x = x;
this.init();
};
SURFACE_POINT.prototype = {
SPRING_CONSTANT : 0.03,
SPRING_FRICTION : 0.9,
WAVE_SPREAD : 0.3,
ACCELARATION_RATE : 0.01,
init : function(){
this.initHeight = this.renderer.height * this.renderer.INIT_HEIGHT_RATE;
this.height = this.initHeight;
this.fy = 0;
this.force = {previous : 0, next : 0};
},
setPreviousPoint : function(previous){
this.previous = previous;
},
setNextPoint : function(next){
this.next = next;
},
interfere : function(y, velocity){
this.fy = this.renderer.height * this.ACCELARATION_RATE * ((this.renderer.height - this.height - y) >= 0 ? -1 : 1) * Math.abs(velocity);
},
updateSelf : function(){
this.fy += this.SPRING_CONSTANT * (this.initHeight - this.height);
this.fy *= this.SPRING_FRICTION;
this.height += this.fy;
},
updateNeighbors : function(){
if(this.previous){
this.force.previous = this.WAVE_SPREAD * (this.height - this.previous.height);
}
if(this.next){
this.force.next = this.WAVE_SPREAD * (this.height - this.next.height);
}
},
render : function(context){
if(this.previous){
this.previous.height += this.force.previous;
this.previous.fy += this.force.previous;
}
if(this.next){
this.next.height += this.force.next;
this.next.fy += this.force.next;
}
context.lineTo(this.x, this.renderer.height - this.height);
}
};
var FISH = function(renderer){
this.renderer = renderer;
this.init();
};
FISH.prototype = {
GRAVITY : 0.4,
init : function(){
this.direction = Math.random() < 0.5;
this.x = this.direction ? (this.renderer.width + this.renderer.THRESHOLD) : -this.renderer.THRESHOLD;
this.previousY = this.y;
this.vx = this.getRandomValue(4, 10) * (this.direction ? -1 : 1);
if(this.renderer.reverse){
this.y = this.getRandomValue(this.renderer.height * 1 / 10, this.renderer.height * 4 / 10);
this.vy = this.getRandomValue(2, 5);
this.ay = this.getRandomValue(0.05, 0.2);
}else{
this.y = this.getRandomValue(this.renderer.height * 6 / 10, this.renderer.height * 9 / 10);
this.vy = this.getRandomValue(-5, -2);
this.ay = this.getRandomValue(-0.2, -0.05);
}
this.isOut = false;
this.theta = 0;
this.phi = 0;
},
getRandomValue : function(min, max){
return min + (max - min) * Math.random();
},
reverseVertical : function(){
this.isOut = !this.isOut;
this.ay *= -1;
},
controlStatus : function(context){
this.previousY = this.y;
this.x += this.vx;
this.y += this.vy;
this.vy += this.ay;
if(this.renderer.reverse){
if(this.y > this.renderer.height * this.renderer.INIT_HEIGHT_RATE){
this.vy -= this.GRAVITY;
this.isOut = true;
}else{
if(this.isOut){
this.ay = this.getRandomValue(0.05, 0.2);
}
this.isOut = false;
}
}else{
if(this.y < this.renderer.height * this.renderer.INIT_HEIGHT_RATE){
this.vy += this.GRAVITY;
this.isOut = true;
}else{
if(this.isOut){
this.ay = this.getRandomValue(-0.2, -0.05);
}
this.isOut = false;
}
}
if(!this.isOut){
this.theta += Math.PI / 20;
this.theta %= Math.PI * 2;
this.phi += Math.PI / 30;
this.phi %= Math.PI * 2;
}
this.renderer.generateEpicenter(this.x + (this.direction ? -1 : 1) * this.renderer.THRESHOLD, this.y, this.y - this.previousY);
if(this.vx > 0 && this.x > this.renderer.width + this.renderer.THRESHOLD || this.vx < 0 && this.x < -this.renderer.THRESHOLD){
this.init();
}
},
render : function(context){
context.save();
context.translate(this.x, this.y);
context.rotate(Math.PI + Math.atan2(this.vy, this.vx));
context.scale(1, this.direction ? 1 : -1);
context.beginPath();
context.moveTo(-30, 0);
context.bezierCurveTo(-20, 15, 15, 10, 40, 0);
context.bezierCurveTo(15, -10, -20, -15, -30, 0);
context.fill();
context.save();
context.translate(40, 0);
context.scale(0.9 + 0.2 * Math.sin(this.theta), 1);
context.beginPath();
context.moveTo(0, 0);
context.quadraticCurveTo(5, 10, 20, 8);
context.quadraticCurveTo(12, 5, 10, 0);
context.quadraticCurveTo(12, -5, 20, -8);
context.quadraticCurveTo(5, -10, 0, 0);
context.fill();
context.restore();
context.save();
context.translate(-3, 0);
context.rotate((Math.PI / 3 + Math.PI / 10 * Math.sin(this.phi)) * (this.renderer.reverse ? -1 : 1));
context.beginPath();
if(this.renderer.reverse){
context.moveTo(5, 0);
context.bezierCurveTo(10, 10, 10, 30, 0, 40);
context.bezierCurveTo(-12, 25, -8, 10, 0, 0);
}else{
context.moveTo(-5, 0);
context.bezierCurveTo(-10, -10, -10, -30, 0, -40);
context.bezierCurveTo(12, -25, 8, -10, 0, 0);
}
context.closePath();
context.fill();
context.restore();
context.restore();
this.controlStatus(context);
}
};
$(function(){
RENDERER.init();
});