使用Android(Kotlin)+ ML Kit:移动端英文数字验证码识别实战
1 概述与适用场景
在移动端直接对截图或拍照的英文数字验证码做识别,可以用于自动化测试、无障碍辅助或内部工具。使用 Google ML Kit 的 Text Recognition(可离线运行)可以避免服务端延迟。为了提升识别率,我们在前端加入图像预处理(灰度、二值化、去噪和放大)再送给 OCR。
2 环境与依赖
更多内容访问ttocr.com或联系1436423940
Android Studio Arctic Fox 或更高
Kotlin 1.5+
AndroidX
使用 ML Kit Text Recognition(on-device API)
在 app/build.gradle(module)中添加依赖(版本根据你的 Android Studio / Kotlin 版本微调):
dependencies {
implementation "androidx.appcompat:appcompat:1.4.0"
implementation "com.google.mlkit:text-recognition:16.0.0" // ML Kit on-device
implementation "com.google.android.material:material:1.4.0"
implementation "androidx.constraintlayout:constraintlayout:2.1.2"
}
(注:若你需要支持中文等,ML Kit 还有其他模型。本文只用默认英文数字识别。)
3 Android 权限与清单
在 AndroidManifest.xml 添加相机权限(若启用拍照):
并在
4 简单 UI(activity_main.xml)
创建一个极简界面,包含:拍照/选择按钮、ImageView 显示处理后图像、识别按钮、TextView 显示结果。
<androidx.constraintlayout.widget.ConstraintLayout
xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:app="http://schemas.android.com/apk/res-auto"
android:layout_width="match_parent"
android:layout_height="match_parent">
<ImageView
android:id="@+id/imageView"
android:layout_width="0dp"
android:layout_height="0dp"
android:contentDescription="captcha"
app:layout_constraintTop_toTopOf="parent"
app:layout_constraintBottom_toTopOf="@+id/buttonRow"
app:layout_constraintLeft_toLeftOf="parent"
app:layout_constraintRight_toRightOf="parent"
android:scaleType="fitCenter"
android:adjustViewBounds="true"
android:background="#EEE"/>
<LinearLayout
android:id="@+id/buttonRow"
android:layout_width="0dp"
android:layout_height="wrap_content"
app:layout_constraintBottom_toTopOf="@+id/resultText"
app:layout_constraintLeft_toLeftOf="parent"
app:layout_constraintRight_toRightOf="parent"
android:gravity="center"
android:orientation="horizontal"
android:padding="8dp">
<Button
android:id="@+id/btnSelect"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:text="Select" />
<Button
android:id="@+id/btnCapture"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:text="Capture"
android:layout_marginStart="12dp"/>
<Button
android:id="@+id/btnProcess"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:text="Process+OCR"
android:layout_marginStart="12dp"/>
</LinearLayout>
<TextView
android:id="@+id/resultText"
android:layout_width="0dp"
android:layout_height="wrap_content"
app:layout_constraintBottom_toBottomOf="parent"
app:layout_constraintLeft_toLeftOf="parent"
app:layout_constraintRight_toRightOf="parent"
android:padding="12dp"
android:textSize="18sp"
android:textColor="#111"/>
</androidx.constraintlayout.widget.ConstraintLayout>
5 Kotlin 主 Activity(核心逻辑)
下面给出 MainActivity.kt 的完整可运行骨架,包含:图片选择/拍照、处理函数(灰度、二值化、放大、去噪)、调用 ML Kit TextRecognizer、白名单过滤与结果显示。
// MainActivity.kt
package com.example.captchaocr
import android.Manifest
import android.app.Activity
import android.content.Intent
import android.graphics.*
import android.net.Uri
import android.os.Bundle
import android.provider.MediaStore
import android.widget.Button
import android.widget.ImageView
import android.widget.TextView
import androidx.activity.result.contract.ActivityResultContracts
import androidx.appcompat.app.AppCompatActivity
import androidx.core.app.ActivityCompat
import com.google.mlkit.vision.common.InputImage
import com.google.mlkit.vision.text.TextRecognition
import java.io.IOException
import java.util.regex.Pattern
class MainActivity : AppCompatActivity() {
private lateinit var imageView: ImageView
private lateinit var resultText: TextView
private var currentBitmap: Bitmap? = null
private val pickImageLauncher =
registerForActivityResult(ActivityResultContracts.StartActivityForResult()) { ar ->
if (ar.resultCode == Activity.RESULT_OK) {
val data = ar.data
val uri = data?.data
uri?.let { loadBitmapFromUri(it) }
}
}
private val takePhotoLauncher =
registerForActivityResult(ActivityResultContracts.StartActivityForResult()) { ar ->
if (ar.resultCode == Activity.RESULT_OK) {
val bitmap = ar.data?.extras?.get("data") as? Bitmap
bitmap?.let {
currentBitmap = it
imageView.setImageBitmap(it)
}
}
}
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
ActivityCompat.requestPermissions(this, arrayOf(Manifest.permission.CAMERA), 0)
setContentView(R.layout.activity_main)
imageView = findViewById(R.id.imageView)
resultText = findViewById(R.id.resultText)
findViewById<Button>(R.id.btnSelect).setOnClickListener {
val intent = Intent(Intent.ACTION_PICK, MediaStore.Images.Media.EXTERNAL_CONTENT_URI)
pickImageLauncher.launch(intent)
}
findViewById<Button>(R.id.btnCapture).setOnClickListener {
val intent = Intent(MediaStore.ACTION_IMAGE_CAPTURE)
takePhotoLauncher.launch(intent)
}
findViewById<Button>(R.id.btnProcess).setOnClickListener {
currentBitmap?.let { bmp ->
val processed = preprocessForOCR(bmp)
imageView.setImageBitmap(processed)
runTextRecognition(processed)
} ?: run {
resultText.text = "No image loaded"
}
}
}
private fun loadBitmapFromUri(uri: Uri) {
try {
val bmp = MediaStore.Images.Media.getBitmap(contentResolver, uri)
currentBitmap = bmp
imageView.setImageBitmap(bmp)
} catch (e: IOException) {
e.printStackTrace()
}
}
// ------- 图像预处理函数 -------
private fun preprocessForOCR(src: Bitmap): Bitmap {
// 1. 灰度化
val gray = toGrayscale(src)
// 2. 放大(放大有助于小字体识别)
val scaled = Bitmap.createScaledBitmap(gray, gray.width * 2, gray.height * 2, true)
// 3. 轻度模糊去噪(可选)
val denoised = gaussianBlur(scaled, 1)
// 4. 自适应/固定阈值二值化
val bin = thresholdOtsu(denoised)
// 5. 可选:形态学操作(在 Android 上我们用简单的 dilate/erode 心得实现)
val morph = simpleMorphology(bin)
return morph
}
private fun toGrayscale(src: Bitmap): Bitmap {
val w = src.width
val h = src.height
val bmp = Bitmap.createBitmap(w, h, Bitmap.Config.ARGB_8888)
val canvas = Canvas(bmp)
val paint = Paint()
val cm = ColorMatrix()
cm.setSaturation(0f)
paint.colorFilter = ColorMatrixColorFilter(cm)
canvas.drawBitmap(src, 0f, 0f, paint)
return bmp
}
private fun gaussianBlur(src: Bitmap, radius: Int): Bitmap {
// 简单 box blur 代替,性能较好;可用 RenderScript/ScriptIntrinsicBlur(废弃)或第三方库
if (radius <= 0) return src
val w = src.width
val h = src.height
val bmp = src.copy(Bitmap.Config.ARGB_8888, true)
val pixels = IntArray(w*h)
bmp.getPixels(pixels, 0, w, 0, 0, w, h)
// 简单均值模糊 kernel size = 3
val out = IntArray(w*h)
for (y in 1 until h-1) {
for (x in 1 until w-1) {
var rSum=0; var gSum=0; var bSum=0
for (ky in -1..1) {
for (kx in -1..1) {
val p = pixels[(y+ky)*w + (x+kx)]
rSum += (p shr 16) and 0xFF
gSum += (p shr 8) and 0xFF
bSum += p and 0xFF
}
}
val nr = (rSum/9)
val ng = (gSum/9)
val nb = (bSum/9)
out[y*w+x] = (0xFF shl 24) or (nr shl 16) or (ng shl 8) or nb
}
}
val outBmp = Bitmap.createBitmap(w, h, Bitmap.Config.ARGB_8888)
outBmp.setPixels(out, 0, w, 0, 0, w, h)
return outBmp
}
private fun thresholdOtsu(src: Bitmap): Bitmap {
val w = src.width
val h = src.height
val gray = IntArray(w*h)
src.getPixels(gray, 0, w, 0, 0, w, h)
val hist = IntArray(256)
for (p in gray) {
val v = (p shr 16) and 0xFF // R channel (灰度后 R=G=B)
hist[v]++
}
val total = w*h
// Otsu
var sum = 0
for (t in 0..255) sum += t * hist[t]
var sumB = 0
var wB = 0
var wF: Int
var varMax = 0.0
var threshold = 0
for (t in 0..255) {
wB += hist[t]
if (wB == 0) continue
wF = total - wB
if (wF == 0) break
sumB += t * hist[t]
val mB = sumB.toDouble() / wB
val mF = (sum - sumB).toDouble() / wF
val between = wB.toDouble() * wF.toDouble() * (mB - mF) * (mB - mF)
if (between > varMax) {
varMax = between
threshold = t
}
}
// apply threshold
val out = IntArray(w*h)
for (i in gray.indices) {
val v = (gray[i] shr 16) and 0xFF
out[i] = if (v > threshold) Color.WHITE else Color.BLACK
}
val bmp = Bitmap.createBitmap(w, h, Bitmap.Config.ARGB_8888)
bmp.setPixels(out, 0, w, 0, 0, w, h)
return bmp
}
private fun simpleMorphology(src: Bitmap): Bitmap {
// 简单膨胀 + 腐蚀实现,kernel 3x3
val w = src.width
val h = src.height
val pixels = IntArray(w*h)
src.getPixels(pixels, 0, w, 0, 0, w, h)
val tmp = pixels.copyOf()
// 膨胀(扩大白色区域)
for (y in 1 until h-1) {
for (x in 1 until w-1) {
var anyWhite = false
for (ky in -1..1) {
for (kx in -1..1) {
val v = tmp[(y+ky)*w + (x+kx)]
if (v == Color.WHITE) { anyWhite = true; break }
}
if (anyWhite) break
}
pixels[y*w + x] = if (anyWhite) Color.WHITE else Color.BLACK
}
}
val bmp = Bitmap.createBitmap(w, h, Bitmap.Config.ARGB_8888)
bmp.setPixels(pixels, 0, w, 0, 0, w, h)
return bmp
}
// ------- ML Kit 调用 -------
private fun runTextRecognition(bitmap: Bitmap) {
val image = InputImage.fromBitmap(bitmap, 0)
val recognizer = TextRecognition.getClient() // on-device recognizer
recognizer.process(image)
.addOnSuccessListener { visionText ->
val raw = visionText.text
val cleaned = filterAlphaNum(raw)
resultText.text = "Raw: $raw\nCleaned: $cleaned"
}
.addOnFailureListener { e ->
resultText.text = "Error: ${e.message}"
}
}
private fun filterAlphaNum(s: String): String {
// 只保留大小写字母和数字,且移除空格与换行
val pattern = Pattern.compile("[^A-Za-z0-9]")
return pattern.matcher(s).replaceAll("").trim()
}
}
说明:
这段代码在 btnProcess 被点击时,完成预处理并调用 ML Kit 做识别。
toGrayscale 使用 ColorMatrix 做灰度化(效率好)。
thresholdOtsu 实现 Otsu 自适应阈值用于二值化。
runTextRecognition 使用 ML Kit 的 on-device API;识别完成后用 filterAlphaNum 做白名单过滤。
为简洁起见,图像处理函数没有做极致性能优化。实际 App 可把耗时操作放在后台线程(例如使用 Coroutine 或 ExecutorService)。
浙公网安备 33010602011771号