Stay Hungry,Stay Foolish!

Integrating OpenCV python tool into one SKlearn MNIST example for supporting prediction

背景

https://www.cnblogs.com/lightsong/p/14469252.html

如上博客对应进展是, 集成hub数据,基于MNIST数据构建手写数字识别模型, 得到逻辑回归模型的预测准确度。

如上模型,仅仅是训练出来,但是如何应用此模型进行预测, 还需要引入工具,对任意手写图像进行处理,规整化为MNIST数据标准格式,

从网站上看到标准数据格式为:

  • shape为(28, 28)
  • 灰度表示
  • 背景为纯黑, 书写痕迹为白向

通过学习OpenCV接口,实现这种图像处理功能。

 

https://app.activeloop.ai/datasets/popular

数据源名称为 activeloop/mnist

 

 

 

OpenCV

http://www.opencv.org.cn/opencvdoc/2.3.2/html/doc/tutorials/tutorials.html

https://docs.opencv.org/master/d1/dfb/intro.html

OpenCV 是一种开源的库, 包括数百种计算机视觉算法。

包括以下模块:

  • 核心模块 - 定义基本的数据结构 和 基本的函数, 被所有的上层模块使用。
  • 图像处理 - 图像过滤 和 几何图像变换, 色彩空间转换。
  • 视频分析 - 移动估计,背景提取,目标检测。
  • 。。。

OpenCV (Open Source Computer Vision Library: http://opencv.org) is an open-source library that includes several hundreds of computer vision algorithms. The document describes the so-called OpenCV 2.x API, which is essentially a C++ API, as opposed to the C-based OpenCV 1.x API (C API is deprecated and not tested with "C" compiler since OpenCV 2.4 releases)

OpenCV has a modular structure, which means that the package includes several shared or static libraries. The following modules are available:

  • Core functionality (core) - a compact module defining basic data structures, including the dense multi-dimensional array Mat and basic functions used by all other modules.
  • Image Processing (imgproc) - an image processing module that includes linear and non-linear image filtering, geometrical image transformations (resize, affine and perspective warping, generic table-based remapping), color space conversion, histograms, and so on.
  • Video Analysis (video) - a video analysis module that includes motion estimation, background subtraction, and object tracking algorithms.
  • Camera Calibration and 3D Reconstruction (calib3d) - basic multiple-view geometry algorithms, single and stereo camera calibration, object pose estimation, stereo correspondence algorithms, and elements of 3D reconstruction.
  • 2D Features Framework (features2d) - salient feature detectors, descriptors, and descriptor matchers.
  • Object Detection (objdetect) - detection of objects and instances of the predefined classes (for example, faces, eyes, mugs, people, cars, and so on).
  • High-level GUI (highgui) - an easy-to-use interface to simple UI capabilities.
  • Video I/O (videoio) - an easy-to-use interface to video capturing and video codecs.
  • ... some other helper modules, such as FLANN and Google test wrappers, Python bindings, and others.

The further chapters of the document describe functionality of each module. But first, make sure to get familiar with the common API concepts used thoroughly in the library.

 

python wrapper for OpenCV

https://docs.opencv.org/master/d6/d00/tutorial_py_root.html

此工具核心实现是C++, 为扩大应用场景, 开放了若干种语言API, python是其中之一。

如下为 python API支持的所有功能。

涵盖了 主页上 列举的所有库提供的功能。

同时我们发现有趣的功能, 此库还提供了 机器学习算法, 包括以下三种模型。

 

python API介绍

Image Processing in OpenCV

https://docs.opencv.org/master/d2/d96/tutorial_py_table_of_contents_imgproc.html

我们的问题面向图片的特征预处理, 所以我们聚焦在图像处理章节。

 

 

图片数字特征 -- 补充知识

https://zhuanlan.zhihu.com/p/267130193

 我们需要的是灰度图, 图片每一个像素点, 只含有一个维度的特征, 0 -255, 表示像素点的明暗特征, 即白和黑,以及介于黑白之间的色, 不能表示彩色。

彩色图是三通道的, 即图片每一个像素点,都含有三个维度的 特征 RGB

图像分为二值图,灰度图,伪彩色图,真彩色图。

二值图:图像的像素只有 0 和 1。0 位黑色,1 为白色。

灰度图:图像的像素值有256种(0 - 255)。这种图像的RGB(红绿蓝),对应的值是相等的。

伪彩色图:RGB对应有256种颜色的索引,通过对应的颜色板去确定颜色的深浅。

真彩色图:对应的RGB颜色直接取对应的值,即为真彩色图。

 

https://zhuanlan.zhihu.com/p/36592188

此文中对MNIST数据的探索,表明像素点是 灰度数据,非二值数据。

 

Changing Colorspaces

https://docs.opencv.org/master/df/d9d/tutorial_py_colorspaces.html

Changing Color-space

There are more than 150 color-space conversion methods available in OpenCV. But we will look into only two, which are most widely used ones: BGR Gray and BGR HSV.

For color conversion, we use the function cv.cvtColor(input_image, flag) where flag determines the type of conversion.

For BGR Gray conversion, we use the flag cv.COLOR_BGR2GRAY. Similarly for BGR HSV, we use the flag cv.COLOR_BGR2HSV. To get other flags, just run following commands in your Python terminal:

>>> import cv2 as cv
>>> flags = [i for i in dir(cv) if i.startswith('COLOR_')]
>>> print( flags )

 

Geometric Transformations of Images

https://docs.opencv.org/master/da/d6e/tutorial_py_geometric_transformations.html

对图像进行缩放, 插值选取有讲究。 缩小 使用 INTER_AREA , 放大 使用 INTER_CUBIC

Scaling

Scaling is just resizing of the image. OpenCV comes with a function cv.resize() for this purpose. The size of the image can be specified manually, or you can specify the scaling factor. Different interpolation methods are used. Preferable interpolation methods are cv.INTER_AREA for shrinking and cv.INTER_CUBIC (slow) & cv.INTER_LINEAR for zooming. By default, the interpolation method cv.INTER_LINEAR is used for all resizing purposes. You can resize an input image with either of following methods:

import numpy as np
import cv2 as cv
img = cv.imread('messi5.jpg')
res = cv.resize(img,None,fx=2, fy=2, interpolation = cv.INTER_CUBIC)
#OR
height, width = img.shape[:2]
res = cv.resize(img,(2*width, 2*height), interpolation = cv.INTER_CUBIC)

 

Lenna show -- 图像处理demo

https://zhuanlan.zhihu.com/p/49957946

安装

OpenCV 的安装,不同平台不同版本会有一些差异。安装前需要装好 numpy,强烈建议先安装好 Anaconda,然后直接通过命令安装:

pip install opencv-python

如果你运气好,代码里运行 import cv2 没报错就是成功了。

但大多数时候可能不行。你可以考虑去这里下载安装文件:

然后通过命令从本地安装:

pip install opencv_python‑3.4.3‑cp37‑cp37m‑win_amd64.whl

这里下载的安装文件版本号务必要和你本机的 Python 版本位数相对应。

如果安装时还有其他问题,可在网上直接搜索报错,通常都会有解决方案,这里不一一赘述。

图像处理 -- 读写

cv读取出来的数据是以 numpy对象存储, 可以借助其能力对图像做切割等处理。

我们用图像处理的经典范例 Lenna 来做测试

可自行搜索这幅图像的来头

import cv2 as cv
# 读图片
img = cv.imread('img/Lenna.png')
# 图片信息
print('图片尺寸:', img.shape)
print('图片数据:', type(img), img)
# 显示图片
cv.imshow('pic title', img)
cv.waitKey(0)
# 添加文字
cv.putText(img, 'Learn Python with Crossin', (50, 150), cv.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 4)
# 保存图片
cv.imwrite('img/Lenna_new.png', img)

OpenCV-Python 中的图像数据使用了 numpy 库的 ndarray 类型进行管理,便于进行各种数值计算和转换。

图像处理 -- 色彩转换

常见的图像处理:

import numpy as np
# 灰度图
img_gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
cv.imwrite('img/Lenna_gray.png', img_gray)
# 二值化
_, img_bin = cv.threshold(img_gray, 127, 255, cv.THRESH_BINARY)
cv.imwrite('img/Lenna_bin.png', img_bin)
# 平滑
img_blur = cv.blur(img, (5, 5))
cv.imwrite('img/Lenna_blur.png', img_blur)
# 边缘提取
_, contours, _ = cv.findContours(img_bin, cv.RETR_TREE, cv.CHAIN_APPROX_SIMPLE)
img_cont = np.zeros(img_bin.shape, np.uint8)    
cv.drawContours(img_cont, contours, -1, 255, 3) 
cv.imwrite('img/Lenna_cont.png', img_cont)

这几种都属于数字图像处理的常用方法。OpenCV-Python 基本都封装好的接口,只需一两行代码就能完成,在实际项目开发中非常方便。

 

改造成果

数据处理

数据准备, 提供彩色的输入 3

https://github.com/fanqingsong/code_snippet/blob/master/machine_learning/hub/images/three.png

 

 

 

经过处理后,得到符合 MNIST 标准的图像

https://github.com/fanqingsong/code_snippet/blob/master/machine_learning/hub/images/three_normal.jpg

 

 

Code

https://github.com/fanqingsong/code_snippet/blob/master/machine_learning/hub/mnist_sklearn.py

其中最后几行包含了图像预处理

# 读图片
img = cv.imread('images/three.png')
# 灰度图
img_gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
# 二值化
_, img_bin = cv.threshold(img_gray, 127, 255, cv.THRESH_BINARY)
# Resize
img_resized = cv.resize(img_bin, (28,28), interpolation=cv.INTER_AREA)
img_normal = 255 - img_resized
# 保存图片
cv.imwrite('images/three_normal.jpg', img_normal)

# flat to one dimension
one_digit_features = img_normal.reshape((-1))

 

完整code

"""
Test mnist learning on sklearn with hub data
"""

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.utils import check_random_state

from hub import Dataset
import cv2 as cv


print(__doc__)

# Author: Arthur Mensch <arthur.mensch@m4x.org>
# License: BSD 3 clause

# Turn down for faster convergence
train_samples = 5000

print("loading mnist data from hub...")
mnist = Dataset("activeloop/mnist")  # loading the MNIST data lazily

X = mnist["image"].compute()
y = mnist["label"].compute()

print("------- X.shape", X.shape)
print("------- y.shape", y.shape)


print("now train mnist model...")

random_state = check_random_state(0)
permutation = random_state.permutation(X.shape[0])

print("---------- permutation")
print(permutation)

X = X[permutation]
y = y[permutation]

print("------- X.shape", X.shape)
print("------- y.shape", y.shape)

X = X.reshape((X.shape[0], -1))

print("------- X.shape", X.shape)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, train_size=train_samples, test_size=10000)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Turn up tolerance for faster convergence
clf = LogisticRegression(
    C=50. / train_samples, penalty='l1', solver='saga', tol=0.1
)
clf.fit(X_train, y_train)

sparsity = np.mean(clf.coef_ == 0) * 100
score = clf.score(X_test, y_test)

# print('Best C % .4f' % clf.C_)
print("Sparsity with L1 penalty: %.2f%%" % sparsity)
print("Test score with L1 penalty: %.4f" % score)

# now enter predicting part
# 读图片
img = cv.imread('images/three.png')
# 灰度图
img_gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
# 二值化
_, img_bin = cv.threshold(img_gray, 127, 255, cv.THRESH_BINARY)
# Resize
img_resized = cv.resize(img_bin, (28,28), interpolation=cv.INTER_AREA)
img_normal = 255 - img_resized
# 保存图片
cv.imwrite('images/three_normal.jpg', img_normal)

# flat to one dimension
one_digit_features = img_normal.reshape((-1))

result = clf.predict([one_digit_features])
print("------ for three digit picture, the predicted result is ", result)

 

输出

从最后一行的输出,可以看出,确实预测正确,为3

root@xxx:~/win10/mine/code-snippet/machine_learning/hub# python mnist_sklearn.py

Test mnist learning on sklearn with hub data

loading mnist data from hub...
------- X.shape (70000, 28, 28, 1)
------- y.shape (70000,)
now train mnist model...
---------- permutation
[10840 56267 14849 ... 42613 43567 68268]
------- X.shape (70000, 28, 28, 1)
------- y.shape (70000,)
------- X.shape (70000, 784)
Sparsity with L1 penalty: 68.89%
Test score with L1 penalty: 0.8389
------ for three digit picture, the predicted result is  [3]

 

posted @ 2021-03-04 16:42  lightsong  阅读(144)  评论(0)    收藏  举报
千山鸟飞绝,万径人踪灭