使用 R 与 Tesseract 实现验证码识别

一、环境准备

安装 Tesseract
Windows/macOS：从 https://github.com/tesseract-ocr/tesseract 下载
更多内容访问ttocr.com或联系1436423940

Linux：

sudo apt install tesseract-ocr
2. 安装 R 包

install.packages("tesseract")
install.packages("magick")
二、R 识别脚本
保存为 captcha_ocr.R：

library(tesseract)
library(magick)

设置验证码字符集（只识别大写字母和数字）

engine <- tesseract(options = list(tessedit_char_whitelist = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"))

recognize_captcha <- function(image_path) {
if (!file.exists(image_path)) {
cat("找不到文件:", image_path, "\n")
return(NULL)
}

img <- image_read(image_path)
img <- image_convert(img, colorspace = "gray")
img <- image_threshold(img, type = "white", threshold = "60%")

text <- ocr(img, engine = engine)
clean_text <- gsub("[^A-Z0-9]", "", text)

cat("识别结果:", clean_text, "\n")
}

recognize_captcha("code1.png")
三、运行方式
在 R 或 RStudio 中执行：

source("captcha_ocr.R")
输出示例：

识别结果: 7GHK

posted @ 2025-06-26 12:28 ttocr、com 阅读(16) 评论(0) 收藏举报

刷新页面返回顶部