python爬虫-验证码识别

为什么需要识别验证码

验证码是网站的一种反措施，有些时候我们需要登陆用户才可以获取到我们想要的数据，所以验证码识别是必要的。
验证码识别操作：

人工肉眼识别（不推荐）
第三方自动识别
- 云打码（无了）
- 超级鹰：http://www.chaojiying.com
- 图鉴：http://www.ttshitu.com/login.html

超级鹰使用教程

1. 注册账号，微信绑定送题分或者充值（两元能玩一整天

2. 点击用户中心->软件ID，生成一个软件ID（记住你的软件id和密钥

3. 开发文档->python语言Demo下载，下载示例代码

4. 将下载文件解压到项目中(注意修改错误，如print后面加上括号

我们可以看见其中有一个类，构造函数参数是(self, username, password, soft_id)，以后使用传参就是根据这个填

使用爬虫+超级鹰识别古诗文网验证码

"""
验证码识别案例：古诗文网登陆页面验证码识别
具体流程：
    - 保存验证码图片到本地
    - 调用平台代码进行图片识别
"""

from chaojiying_Python.chaojiying import Chaojiying_Client
import requests
from lxml import etree

if __name__ == '__main__':
    # 使用超级鹰客户端类创建对象(账户,密码,软件id)
    cj = Chaojiying_Client('xxxxx', 'xxxxx', 'xxxxx')

    # 登陆url
    login_url = 'https://so.gushiwen.cn/user/login.aspx?from=http://so.gushiwen.cn/user/collect.aspx'
    login_page = requests.get(url=login_url, headers=cj.headers)
    tree = etree.HTML(login_page.text)
    # 获取验证码图片
    img_url = 'https://so.gushiwen.cn' + tree.xpath('//img[@id="imgCode"]/@src')[0]
    print(img_url)
    img_data = requests.get(url=img_url, headers=cj.headers).content
    # 持久化存储
    img_path = '../data3/VerificationCode/code01.jpg'
    with open(img_path, 'wb') as fp:
        fp.write(img_data)
    print("验证码图片已存储！")

    # 开始识别验证码
    im = open(img_path, 'rb').read()  # 本地图片文件路径 来替换 a.jpg 有时WIN系统须要//
    json_data = cj.PostPic(im, 1902)  # 1902 验证码类型  官方网站>>价格体系
    code = json_data['pic_str']
    print(code)

posted on 2022-03-13 15:43 S++ 阅读(197) 评论(0) 收藏举报

刷新页面返回顶部

S++

导航

公告

python爬虫-验证码识别

为什么需要识别验证码

超级鹰使用教程

1. 注册账号，微信绑定送题分或者充值（两元能玩一整天

2. 点击用户中心->软件ID，生成一个软件ID（记住你的软件id和密钥

3. 开发文档->python语言Demo下载，下载示例代码

4. 将下载文件解压到项目中(注意修改错误，如print后面加上括号

使用爬虫+超级鹰识别古诗文网验证码