python图像识别--验证码

1、pip3 install pyocr

2、pip3 install pillow or easy_install Pillow

3、安装tesseract-ocr：http://jaist.dl.sourceforge.net/project/tesseract-ocr-alt/tesseract-ocr-setup-3.02.02.exe，安装在C:\Program Files\下

4、要求python默认安装在C盘

5、找到 pytesseract.py 更改 tesseract_cmd = 'C:/Program Files/Tesseract-OCR/tesseract.exe'

代码：

# !/usr/bin/python3.4
# -*- coding: utf-8 -*-

import pytesseract
from PIL import Image

image = Image.open('../jpg/code.png')
code = pytesseract.image_to_string(image)
print(code)

如果出现错误：

'str' does not support the buffer interface

将 `pytesseract.py` 中的下面语句更换：

1 lines = error_string.splitlines()
2 #error_lines = tuple(line for line in lines if line.find('Error') >= 0)
3 error_lines = tuple(line.decode('utf-8') for line in lines if line.find(b'Error') >= 0)
4 if len(error_lines) > 0:
5     return '\n'.join(error_lines)
6 else:
7     return error_string.strip()

如果要识别更多的文字，需要在安装tesseract-ocr的时候选择全部语言，也就1.3G

识别精度不是很高，要不就是现在的验证码太变态，人为也看不出来是什么

推荐机器学习验证码：http://www.cnblogs.com/beer/p/5672678.html

posted on 2016-10-25 15:25 TTyb 阅读(5183) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

个人博客转至：tybai.com

TTyb

python图像识别--验证码

导航

公告