python爬虫 - 随笔分类 - 凯帅

python模块pyautogui

摘要：一：前置操作自动防故障 pyautogui.FAILSAFE = False # 默认为True，这项功能意味着：当鼠标的指针在屏幕的最左上方，程序会报错；目的是为了防止程序无法停止停顿功能 pyautogui.PAUSE = 1 # 所有pyautogui的指令都要暂停一秒；其他指令不会停顿；阅读全文

posted @ 2021-12-16 16:53 凯帅阅读(685) 评论(0) 推荐(0)

Python requests关于爬虫下载下来的数据乱码问题

摘要：1.html下载后乱码直接用浏览器检查原网页的编码，然后把你下载下来的网页数据设置为网页上显示的编码，result.encoding=“网页上的编码” 2.直接获取api的json数据乱码最近几年网页传输出现了新的br压缩方式，在请求的时候如果你的headers里面Accept-Encoding 阅读全文

posted @ 2021-02-24 22:06 凯帅阅读(448) 评论(1) 推荐(1)

linux和windows安装NodeJS和NPM

摘要：linux安装: node.js 点击后就下载了这里复制它的链接使用wget下载 wget https://npm.taobao.org/mirrors/node/v12.14.1/node-v12.14.1-linux-x64.tar.xz 可以看到已经下载好了下载的是tar.xz压缩文件阅读全文

posted @ 2021-02-09 02:22 凯帅阅读(364) 评论(0) 推荐(0)

selenium结合requests实现session会话

摘要：requests实现session会话 import requestssession = session = requests.session() def cookie_to_cookiejar(cookies): if not hasattr(cookies, "startswith"): rai 阅读全文

posted @ 2021-01-30 13:45 凯帅阅读(989) 评论(0) 推荐(0)

selenium 刷新问题及拿到切换页面句柄

摘要：from selenium import webdriver headers = { "Cookie": "JSESSIONID=xxxxxxxx", } def login(): url = "www.baidu.com" browser = webdriver.Chrome() browser. 阅读全文

posted @ 2021-01-01 15:42 凯帅阅读(294) 评论(0) 推荐(0)

xpath特殊场景使用

摘要：1.当下某个标签不含某个属性 size_list = li.xpath('.//ul[@class="sizes"]/li[not(@class="noStock")]/text()').extract() 2.查询某个标签下兄弟标签 next_url = data.xpath('//div[@cl 阅读全文

posted @ 2020-12-03 13:17 凯帅阅读(139) 评论(0) 推荐(0)

python常用的日期时间以及循环日期

摘要：今天、昨天、明天 import datetime today = datetime.date.today() # 今天 yesterday = today - datetime.timedelta(days=1) # 昨天 tomorrow = today + datetime.timedelta( 阅读全文

posted @ 2020-10-17 18:08 凯帅阅读(6109) 评论(0) 推荐(0)

scrapy 命令行传参以及发送post请求payload参数

摘要：class SciencedirectspiderSpider(scrapy.Spider): name = 'sciencedirectspider' allowed_domains = ['sciencedirect.com'] start_urls = ['https://www.scienc 阅读全文

posted @ 2020-07-15 17:45 凯帅阅读(652) 评论(0) 推荐(0)

scrapy框架+selenium的使用

摘要：scrapy框架+selenium的使用 1 使用情景: 在通过scrapy框架进行某些网站数据爬取的时候，往往会碰到页面动态数据加载的情况发生，如果直接使用scrapy对其url发请求，是绝对获取不到那部分动态加载出来的数据值。但是通过观察我们会发现，通过浏览器进行url请求发送则会加载出对应的动阅读全文

posted @ 2020-07-12 13:58 凯帅阅读(1058) 评论(0) 推荐(0)

python 制作GUI页面以及多选框、单选框

摘要：import osimport tkinter as tk from tkinter import filedialog from tkinter.scrolledtext import ScrolledText window = tk.Tk() window.title('华润万家门店导出') # 阅读全文

posted @ 2020-06-09 18:00 凯帅阅读(1522) 评论(0) 推荐(0)

Python图片识别——人工智能篇

摘要：二、安装识别引擎tesseract-ocr 一、安装pytesseract和PIL PIL全称：Python Imaging Library，python图像处理库，这个库支持多种文件格式，并提供了强大的图像处理和图形处理能力。由于PIL仅支持到Python 2.7，所以在PIL的基础上创建了Pi 阅读全文

posted @ 2020-05-28 17:49 凯帅阅读(3200) 评论(0) 推荐(1)

爬虫重复请求超时

摘要：from retrying import retry def is_request_exception(e): return issubclass(type(e),RequestException) @retry(retry_on_exception=is_request_exception,wra 阅读全文

posted @ 2020-05-26 15:14 凯帅阅读(240) 评论(0) 推荐(0)

指定页面刷新时间前端

摘要：function myrefresh() { window.location.reload(); } setTimeout('myrefresh()', 1000); //指定1秒刷新一次 </script> 阅读全文

posted @ 2020-05-25 21:47 凯帅阅读(439) 评论(0) 推荐(0)

requests post请求，加上会话功能以及url 编码问题

摘要：import requests from urllib.parse import urlencode from openpyxl import Workbook requests = requests.session() login_url = "https://passport.simuwang. 阅读全文

posted @ 2020-05-13 15:42 凯帅阅读(734) 评论(0) 推荐(0)

爬虫常用mysql

摘要：1.导出数据 mysqldump -u root -p123456 tiantian > C:\Users\ASUS\Desktop\shangduogou.sql :然后输入密码 mysqldump -u dbuser -p dbname > dbname.sql 2.操作数据库 import p 阅读全文

posted @ 2020-05-08 22:21 凯帅阅读(253) 评论(0) 推荐(0)

python操作excel以及word文档，pdf文档

摘要：1.读excel import xlrd # 打开excel data = xlrd.open_workbook("Gitee.xlsx") table = data.sheet_by_name("程序开发") # # 选择的表单页 # print(table.nrows) # 多少行 # prin 阅读全文

posted @ 2020-05-01 19:49 凯帅阅读(587) 评论(0) 推荐(0)

爬虫常用正则表达式

摘要：1.指定开头，指定结尾 str1 = "background-image: url(https://image2.pearvideo.com/cont/20200428/cont-1671582-12370181.png);" # \b开头字符.*?结尾字符\b res = re.search(r" 阅读全文

posted @ 2020-04-28 18:45 凯帅阅读(809) 评论(0) 推荐(0)

selenium 无头模式以及防止被检测

摘要：from selenium import webdriver from selenium.webdriver.chrome.options import Options # => 引入Chrome的配置 import time # 配置 ch_options = Options() ch_optio 阅读全文

posted @ 2020-04-25 16:07 凯帅阅读(5278) 评论(0) 推荐(0)

.通过select 进行定位下拉框

摘要：首先selenium 很人性化的给提供了一个Select的模块，供处理下来菜单，首先我们需要导入Select，通过from selenium.webdriver.support.select import Select来导入。 Select中提供几个用于定位的option的方法,下面看一下具体的方法阅读全文

posted @ 2020-04-23 17:35 凯帅阅读(890) 评论(0) 推荐(0)

JDK1.8.0_181安装及环境配置

摘要：一、JDK的安装1.双击jdk安装文件。 2.直接点击“下一步” 。 3.点击“更改”，更改安装路径，可自定义安装路径（可将安装路径复制下来，在环境配置中需要安装路径），更改后点击确定。 4.等待安装完成，安装完成后直接点击关闭即完成安装。二、环境的配置1.在控制面板->系统中，点击高级设置 2 阅读全文

posted @ 2020-04-11 16:17 凯帅阅读(4618) 评论(0) 推荐(0)

wukai66

随笔分类 - python爬虫

公告