适合初学者的CNN数字图像识别项目:Digit Recognizer with CNN for beginner

准备工作

MNIST数据集介绍

MNIST(“Modified National Institute of Standards and Technology”)是事实上的计算机视觉“hello world”数据集。自 1999 年发布以来,这个经典的手写图像数据集一直作为基准分类算法的基础。随着新的机器学习技术的出现,MNIST 仍然是研究人员和学习者的可靠资源。我们的目标是从数万张手写图像的数据集中正确识别数字。

数据文件 train.csv 和 test.csv 包含从零到九的手绘数字的灰度图像。

每张图像高 28 像素,宽 28 像素,总共 784 像素。每个像素都有一个与之关联的像素值,表示该像素的亮度或暗度,数字越大表示越暗。该像素值是介于 0 和 255 之间的整数,包括 0 和 255。

训练数据集 (train.csv) 有 785 列。第一列称为“标签”,是用户绘制的数字。其余列包含相关图像的像素值。

训练集中的每个像素列都有一个类似 pixelx 的名称,其中 x 是 0 到 783 之间的整数,包括 0 到 783。要在图像上定位该像素,假设我们已将 x 分解为 x = i * 28 + j,其中 i 和 j 是 0 到 27 之间的整数,包括 0 和 27。然后 pixelx 位于 28 x 28 矩阵的第 i 行和第 j 列(索引为零)。

例如,pixel31 表示左数第四列、上数第二行的像素,如下面的 ascii 图表所示。

从视觉上看,如果我们省略“像素”前缀,像素组成图像如下:

000 001 002 003 ... 026 027
028 029 030 031 ... 054 055
056 057 058 059 ... 082 083
 |   |   |   |  ...  |   |
728 729 730 731 ... 754 755
756 757 758 759 ... 782 783 

测试数据集 (test.csv) 与训练集相同,只是它不包含“标签”列。

您的提交文件应采用以下格式:对于测试集中的 28000 张图像中的每一张,输出一行包含 ImageId 和您预测的数字。例如,如果您预测第一张图像是 3,第二张图像是 7,第三张图像是 8,那么您的提交文件将如下所示:

ImageId,Label
1,3
2,7
3,8 
(27997 more lines)

本次比赛的评价指标是分类准确率,或者说测试图像被正确分类的比例。例如,0.97 的分类准确度表示您已正确分类除 3% 的图像之外的所有图像。

数据集下载:https://wwp.lanzoub.com/iIUFY08t575a

导入包

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

读取数据集

train = pd.read_csv('../input/digit-recognizer/train.csv')
test = pd.read_csv('../input/digit-recognizer/test.csv')

查看数据特征

train.head()
label pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
0 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 4 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 785 columns

train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 42000 entries, 0 to 41999
Columns: 785 entries, label to pixel783
dtypes: int64(785)
memory usage: 251.5 MB

train.isnull().sum()

label 0
pixel0 0
pixel1 0
pixel2 0
pixel3 0
..
pixel779 0
pixel780 0
pixel781 0
pixel782 0
pixel783 0
Length: 785, dtype: int64

sum(train.isnull().sum())

0

预处理训练集|测试集

#y_train 是数字标签
y_train = train['label'].copy()
#X_train 是各像素亮度值
X_train = train.drop('label',axis=1)
y_train.value_counts()

1 4684
7 4401
3 4351
9 4188
2 4177
6 4137
0 4132
4 4072
8 4063
5 3795
Name: label, dtype: int64

y_train = pd.get_dummies(y_train,prefix='Num')
y_train.head()
Num_0 Num_1 Num_2 Num_3 Num_4 Num_5 Num_6 Num_7 Num_8 Num_9
0 0 1 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0 0 0
2 0 1 0 0 0 0 0 0 0 0
3 0 0 0 0 1 0 0 0 0 0
4 1 0 0 0 0 0 0 0 0 0
#28×28一共784个像素,其中的数值表示亮度[0,255]
X_train.describe()
pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 pixel9 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
count 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 ... 42000.000000 42000.000000 42000.000000 42000.00000 42000.000000 42000.000000 42000.0 42000.0 42000.0 42000.0
mean 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.219286 0.117095 0.059024 0.02019 0.017238 0.002857 0.0 0.0 0.0 0.0
std 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 6.312890 4.633819 3.274488 1.75987 1.894498 0.414264 0.0 0.0 0.0 0.0
min 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.0 0.0 0.0 0.0
25% 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.0 0.0 0.0 0.0
50% 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.0 0.0 0.0 0.0
75% 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.0 0.0 0.0 0.0
max 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 254.000000 254.000000 253.000000 253.00000 254.000000 62.000000 0.0 0.0 0.0 0.0

8 rows × 784 columns

#from sklearn.preprocessing import Normalizer
X_train = X_train/255
X_train.head()
pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 pixel9 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

5 rows × 784 columns

X_train.describe()
pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 pixel9 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
count 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 42000.0 ... 42000.000000 42000.000000 42000.000000 42000.000000 42000.000000 42000.000000 42000.0 42000.0 42000.0 42000.0
mean 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000860 0.000459 0.000231 0.000079 0.000068 0.000011 0.0 0.0 0.0 0.0
std 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.024756 0.018172 0.012841 0.006901 0.007429 0.001625 0.0 0.0 0.0 0.0
min 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0
25% 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0
50% 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0
75% 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0
max 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.996078 0.996078 0.992157 0.992157 0.996078 0.243137 0.0 0.0 0.0 0.0

8 rows × 784 columns

X_train = X_train.values.reshape(-1,28,28,1)
X_train

array([[[[0.],
[0.],
[0.],
...,
[0.],
[0.],
[0.]],

test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 28000 entries, 0 to 27999
Columns: 784 entries, pixel0 to pixel783
dtypes: int64(784)
memory usage: 167.5 MB

test.isnull().sum()

pixel0 0
pixel1 0
pixel2 0
pixel3 0
pixel4 0
..
pixel779 0
pixel780 0
pixel781 0
pixel782 0
pixel783 0
Length: 784, dtype: int64

sum(test.isnull().sum())

0

test = test/255
test.head()
pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 pixel9 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

5 rows × 784 columns

test.describe()
pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 pixel9 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
count 28000.0 28000.0 28000.0 28000.0 28000.0 28000.0 28000.0 28000.0 28000.0 28000.0 ... 28000.000000 28000.000000 28000.000000 28000.000000 28000.000000 28000.0 28000.0 28000.0 28000.0 28000.0
mean 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000646 0.000287 0.000110 0.000044 0.000026 0.0 0.0 0.0 0.0 0.0
std 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.021464 0.014184 0.007112 0.004726 0.003167 0.0 0.0 0.0 0.0 0.0
min 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0
25% 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0
50% 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0
75% 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0
max 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.992157 0.996078 0.756863 0.733333 0.466667 0.0 0.0 0.0 0.0 0.0

8 rows × 784 columns

test = test.values.reshape(-1,28,28,1)
test

array([[[[0.],
[0.],
[0.],
...,
[0.],
[0.],
[0.]],

训练CNN Model

import tensorflow as tf
tf.__version__

'2.6.4'

cnn = tf.keras.models.Sequential()

2022-08-01 05:41:16.816392: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 15403 MB memory: -> device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0

#Convolution
cnn.add(tf.keras.layers.Conv2D(filters=256,kernel_size=(5,5),activation='relu',input_shape=(28,28,1)))
#Max Pooling
cnn.add(tf.keras.layers.MaxPool2D(pool_size=(3,3),strides=3))
cnn.add(tf.keras.layers.BatchNormalization())
cnn.add(tf.keras.layers.Conv2D(filters=128,kernel_size=(4,4),activation='relu'))
cnn.add(tf.keras.layers.MaxPool2D(pool_size=(2,2),strides=2))
#Flattening
cnn.add(tf.keras.layers.Flatten())
#Full connection 
cnn.add(tf.keras.layers.Dense(units=256,activation='relu'))
#Output Layer
cnn.add(tf.keras.layers.Dense(units=10,activation='softmax'))
#Compile cnn
cnn.compile(optimizer='adam',loss='categorical_crossentropy')
# Epoch(时期):
# 当一个完整的数据集通过了神经网络一次并且返回了一次,这个过程称为一次>epoch。(也就是说,所有训练样本在神经网络中都 进行了一次正向传播 和一次反向传播 )
# 再通俗一点,一个Epoch就是将所有训练样本训练一次的过程。
# 然而,当一个Epoch的样本(也就是所有的训练样本)数量可能太过庞大(对于计算机而言),就需要把它分成多个小块,也就是就是分成多个Batch 来进行训练。**

# Batch(批 / 一批样本):
# 将整个训练样本分成若干个Batch。

# Batch_Size(批大小):
# 每批样本的大小。

# Iteration(一次迭代):
# 训练一个Batch就是一次Iteration(这个概念跟程序语言中的迭代器相似)。

cnn.fit(X_train,y_train,batch_size=32,epochs=50)

2022-08-01 05:41:18.154328: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)

Epoch 1/50

2022-08-01 05:41:19.541340: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8005

1313/1313 [] - 13s 5ms/step - loss: 0.1159
Epoch 2/50
1313/1313 [
] - 5s 4ms/step - loss: 0.0496
Epoch 3/50
1313/1313 [] - 6s 4ms/step - loss: 0.0367
Epoch 4/50
1313/1313 [
] - 5s 4ms/step - loss: 0.0289
Epoch 5/50
1313/1313 [] - 6s 4ms/step - loss: 0.0256
Epoch 6/50
1313/1313 [
] - 5s 4ms/step - loss: 0.0220
Epoch 7/50
1313/1313 [] - 6s 4ms/step - loss: 0.0192
Epoch 8/50
1313/1313 [
] - 5s 4ms/step - loss: 0.0167
Epoch 9/50
1313/1313 [] - 6s 4ms/step - loss: 0.0146
Epoch 10/50
1313/1313 [
] - 5s 4ms/step - loss: 0.0121
Epoch 11/50
1313/1313 [] - 6s 4ms/step - loss: 0.0133
Epoch 12/50
1313/1313 [
] - 5s 4ms/step - loss: 0.0142
Epoch 13/50
1313/1313 [] - 6s 4ms/step - loss: 0.0119
Epoch 14/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0125
Epoch 15/50
1313/1313 [] - 6s 4ms/step - loss: 0.0103
Epoch 16/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0103
Epoch 17/50
1313/1313 [] - 6s 4ms/step - loss: 0.0130
Epoch 18/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0118
Epoch 19/50
1313/1313 [] - 6s 4ms/step - loss: 0.0093
Epoch 20/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0075
Epoch 21/50
1313/1313 [] - 6s 4ms/step - loss: 0.0075
Epoch 22/50
1313/1313 [
] - 6s 5ms/step - loss: 0.0129
Epoch 23/50
1313/1313 [] - 6s 4ms/step - loss: 0.0105
Epoch 24/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0087
Epoch 25/50
1313/1313 [] - 6s 4ms/step - loss: 0.0097
Epoch 26/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0117
Epoch 27/50
1313/1313 [] - 5s 4ms/step - loss: 0.0051
Epoch 28/50
1313/1313 [
] - 6s 5ms/step - loss: 0.0086
Epoch 29/50
1313/1313 [] - 6s 4ms/step - loss: 0.0100
Epoch 30/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0087
Epoch 31/50
1313/1313 [] - 6s 4ms/step - loss: 0.0096
Epoch 32/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0065
Epoch 33/50
1313/1313 [] - 5s 4ms/step - loss: 0.0082
Epoch 34/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0110
Epoch 35/50
1313/1313 [] - 6s 4ms/step - loss: 0.0063
Epoch 36/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0107
Epoch 37/50
1313/1313 [] - 5s 4ms/step - loss: 0.0048
Epoch 38/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0076
Epoch 39/50
1313/1313 [] - 5s 4ms/step - loss: 0.0154
Epoch 40/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0095
Epoch 41/50
1313/1313 [] - 5s 4ms/step - loss: 0.0052
Epoch 42/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0057
Epoch 43/50
1313/1313 [] - 5s 4ms/step - loss: 0.0080
Epoch 44/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0085
Epoch 45/50
1313/1313 [] - 5s 4ms/step - loss: 0.0108
Epoch 46/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0062
Epoch 47/50
1313/1313 [] - 5s 4ms/step - loss: 0.0118
Epoch 48/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0078
Epoch 49/50
1313/1313 [] - 5s 4ms/step - loss: 0.0083
Epoch 50/50
1313/1313 [
] - 6s 4ms/step - loss: 0.0044

<keras.callbacks.History at 0x7f35f40ac710>

pred = cnn.predict(test)
pred = np.argmax(pred,axis=1)
pred

array([2, 0, 9, ..., 3, 9, 2])

pred = pd.DataFrame(pred,columns=['Label'])
test_id = list(range(1,len(test)+1,1))
test_id = pd.DataFrame(test_id,columns=['ImageId'])
submission = pd.concat([test_id,pred],axis=1)
submission.describe()
ImageId Label
count 28000.000000 28000.000000
mean 14000.500000 4.453036
std 8083.048105 2.896665
min 1.000000 0.000000
25% 7000.750000 2.000000
50% 14000.500000 4.000000
75% 21000.250000 7.000000
max 28000.000000 9.000000

此模型最终准确率为:0.98857

原创作者:孤飞-博客园
原文地址:https://www.cnblogs.com/ranxi169/p/16540166.html

jupyter格式代码查看|下载https://www.kaggle.com/code/ranxi169/digit-recognizer-with-cnn-for-beginner/notebook

posted @ 2022-08-01 14:26  孤飞  阅读(667)  评论(0编辑  收藏  举报