爬虫 - 随笔分类 - 知行Lee

requests.session之set trust_env to disable environment searches for proxies

摘要：import requests s = requests.Session() s.trust_env = False This will prevent requests getting any information from its environment: specifically, it'l 阅读全文

posted @ 2017-03-26 15:17 知行Lee 阅读(938) 评论(0) 推荐(0)

python之selenium调用js(execute_script)

摘要：转载: http://www.cnblogs.com/fnng/p/3230768.html 本节重点：调用js方法 execute_script(script, *args) 在当前窗口/框架同步执行javaScript 脚本：JavaScript的执行。 *参数：适用任何JavaScript 阅读全文

posted @ 2017-03-19 15:36 知行Lee 阅读(54365) 评论(0) 推荐(0)

使用pycharm运行调试scrapy

摘要：摘要 Scrapy是爬虫抓取框架，Pycharm是强大的python的IDE，为了方便使用需要在PyCharm对scrapy程序进行调试 python PyCharm Scrapy scrapy指令其实就是一个python的运行脚本 pyCharm是一个强大的pythonIDE 在运行scrapy库阅读全文

posted @ 2017-01-21 16:27 知行Lee 阅读(5539) 评论(0) 推荐(0)

解决selenium与firefox版本不兼容问题

摘要：Python环境下类比个人使用 32位环境 Python 2.7.12 Selenium 2.53.6 Firefox 47.01 安装selenium可用pip选择对应版本,参考另一教程。因为在用java打开firefox浏览器的时候报错 org.openqa.selenium.firefox 阅读全文

posted @ 2017-01-21 16:24 知行Lee 阅读(425) 评论(0) 推荐(0)

selenium使用ChromeDriver

摘要：什么是ChromeDriver？ ChromeDriver是Chromium team开发维护的，它是实现WebDriver有线协议的一个单独的服务。ChromeDriver通过chrome的自动代理框架控制浏览器，ChromeDriver只与12.0.712.0以上版本的chrome浏览器兼容。阅读全文

posted @ 2017-01-21 16:10 知行Lee 阅读(1885) 评论(0) 推荐(0)

js加载页面使用execute_script选定加载位置

摘要：#由于js逐步加载页面，存在未显示的网页无法加载源码 from selenium import webdriver driver = webdriver.Firefox() init_element = driver.find_element_by_xpath('//a[@href="#" and 阅读全文

posted @ 2017-01-21 15:46 知行Lee 阅读(2719) 评论(0) 推荐(0)

scrapy设置代理

摘要：在爬取网站内容的时候，最常遇到的问题是：网站对IP有限制，会有防抓取功能，最好的办法就是IP轮换抓取（加代理）下面来说一下Scrapy如何配置代理，进行抓取 1.在Scrapy工程下新建“middlewares.py” # Importing base64 library because we'l 阅读全文

posted @ 2017-01-21 15:42 知行Lee 阅读(875) 评论(0) 推荐(0)

urllib2设置代理

摘要：#coding=utf-8 #公司网络只有连接vpn跳板机才能使用该模块 import urllib2 proxy_handler=urllib2.ProxyHandler({'http':'http://username:password@proxyhk.huawei.com:8080', 'https':'https:// username:password @proxyhk.huawei.... 阅读全文

posted @ 2017-01-21 15:41 知行Lee 阅读(3606) 评论(0) 推荐(0)

oracle使用PLSQL免安装客户端

摘要：2. 下载Oracle Instant Client （32-bit）只需要下载instantclient-basic-nt-11.2.0.3.0.zip就可以了，其它的都是一些根据不同需要扩展的包。下载地址：http://www.oracle.com/technetwork/topics/wi 阅读全文

posted @ 2017-01-21 15:26 知行Lee 阅读(6025) 评论(0) 推荐(0)

scrapy安装

摘要：一、 Scrapy简介 Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their page 阅读全文

posted @ 2017-01-21 15:15 知行Lee 阅读(322) 评论(0) 推荐(0)

selenium+phantomjs解析JS

摘要：背景知识： PhantomJS 是一个基于WebKit的服务器端 JavaScript API。它全面支持web而不需浏览器支持，其快速，原生支持各种Web标准： DOM 处理, CSS 选择器, JSON, Canvas, 和 SVG。PhantomJS可以用于页面自动化，网络监测，网页截屏，以及阅读全文

posted @ 2017-01-21 14:57 知行Lee 阅读(3759) 评论(0) 推荐(0)

知行Lee

随笔分类 - 爬虫

公告