Tesseract与ios的集成
最近接触了一个关于图文识别的项目,项目组决定使用Tesseract,大概查了一下但是对于ios的支持貌似不是很好,以下是为Tesseract包装过的oc版本。
原文链接:https://github.com/ldiqual/tesseract-ios
Tesseract for iOS
tesseract-ios is not actively maintained anymore. I encourage you to use gali8's Tesseract-OCR-iOS instead.
About
Tesseract-ios is an Objective-C wrapper for Tesseract OCR.
This project couldn't exist without the Ângelo Suzuki's blog post. A lot of code came from his article.
Requirements
- iOS SDK 6.0, iOS 5.0+ (there is no support for armv6)
- Tesseract and Leptonica libraries from the tesseract-ios-lib repo.
Installation
- Clone this repo from your project folder.
- Download an appropriate tesseract language trained data from the following website:https://code.google.com/p/tesseract-ocr/downloads/list and put it in your project folder
- You should have the following folder structure:
- Add
tesseract-iosas a group, andtessdataby reference to your project:
- Go to your project settings, and ensure that
C++ Standard Library => libstdc++:
Usage
Here is the default workflow to extract text from an image:
- Instantiate Tesseract with data path and language
- Set variables (character set, …)
- Set the image to analyze
- Start recognition
- Get recognized text
- Clear
Code Sample
#import "Tesseract.h"
Tesseract* tesseract = [[Tesseract alloc] initWithDataPath:@"tessdata" language:@"eng"];
[tesseract setVariableValue:@"0123456789" forKey:@"tessedit_char_whitelist"];
[tesseract setImage:[UIImage imageNamed:@"image_sample.jpg"]];
[tesseract recognize];
NSLog(@"%@", [tesseract recognizedText]);
[tesseract clear];
Method reference
-initWithDataPath:language:
- (id)initWithDataPath:(NSString *)dataPath language:(NSString *)language
Initialize a new Tesseract instance.
dataPath: a relative path from the application bundle to the.traineddatafiles. You can find these files from the tesseract downloads section.language: language used for recognition. Ex:eng. Tesseract will search for aeng.traineddatafile in thedataPathdirectory.
Returns nil if instanciation failed.
-setVariableValue:forKey:
- (void)setVariableValue:(NSString *)value forKey:(NSString *)key
Set Tesseract variable key to value. See http://www.sk-spell.sk.cx/tesseract-ocr-en-variables for a complete (but not up-to-date) list.
For instance, use tessedit_char_whitelist to restrict characters to a specific set.
-setImage:
- (void)setImage:(UIImage *)image
Set the image to recognize.
-setLanguage:
- (BOOL)setLanguage:(NSString *)language
Override the language defined with -initWithDataPath:language:.
-recognize
- (BOOL)recognize
Start text recognition. You might want to launch this process in background with NSObject's -performSelectorInBackground:withObject:.
-recognizedText
- (NSString *)recognizedText
Get the text extracted from the image.
-clear
- (void) clear
Clears Tesseract object after text has been recognized from image. Preventing memory leaks.
备忘:看过一篇博文提到:为了提高效率需要对图片进行预处理(二值化、灰度、倾斜校正和图片切割),倾斜校正和图片切割可以用openCV的库处理
博文链接:http://www.cocoachina.com/bbs/read.php?tid=123463 (该博文有demo这里就不多链了)
比较有用的链接:
- FAQ:http://code.google.com/p/tesseract-ocr/wiki/FAQ
- 命令行程序帮助:http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html
- 基于tesseract开发的软件:http://code.google.com/p/tesseract-ocr/wiki/3rdParty
- tesseract提供的工具与各种语言的API:http://code.google.com/p/tesseract-ocr/wiki/AddOns




浙公网安备 33010602011771号