多层感知机与简易CNN的TensorFlow实现

下文使用TensorFlow实现了一个多层感知机和一个简单的卷积神经网络模型,并应用于数据集MNIST。

所有代码以及所使用的的数据集文件可以到作者的GitHub上下载,GitHub上提供的Jupyter Notebook文

件包含代码以及详细注释(代码中使用的每个函数的作用、参数说明)。

import tensorflow as tf
from tensorflow import keras
print(tf.__version__)# 2.0.0

使用的TensorFlow版本为2.0.0

首先获取数据集:

from tensorflow.keras.datasets import mnist
(train_data, train_label), (test_data, test_label) = mnist.load_data('./mnist.npz')

这里需要注意下载数据集时可能会出现HTTP连接超时的问题,可能需要VPN,也可以自行下载mnist.npz文件

再将数据放到C:\Users\Administrator\.keras\datasets文件夹下。

多层感知机的实现:

# 定义模型
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

模型结构:

print(model.summary())
"""
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 256)               200960    
_________________________________________________________________
dense_1 (Dense)              (None, 10)                2570      
=================================================================
Total params: 203,530
Trainable params: 203,530
Non-trainable params: 0
_________________________________________________________________
None
"""

对数据做归一化处理,并设定模型超参数,训练模型:

# 将输入数据归一化
train_data = train_data / 255.0
test_data = test_data / 255.0

model.compile(optimizer=tf.keras.optimizers.SGD(lr=0.5), 
             loss='sparse_categorical_crossentropy', 
             metrics=['accuracy'])

model.fit(train_data, train_label, epochs=5,
              batch_size=256,
              validation_data=(test_data, test_label),
              validation_freq=1)

训练结果:

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
60000/60000 [==============================] - 16s 259us/sample - loss: 0.3641 - accuracy: 0.8926 - val_loss: 0.2121 - val_accuracy: 0.9351
Epoch 2/5
60000/60000 [==============================] - 4s 63us/sample - loss: 0.1652 - accuracy: 0.9523 - val_loss: 0.1375 - val_accuracy: 0.9580
Epoch 3/5
60000/60000 [==============================] - 4s 63us/sample - loss: 0.1199 - accuracy: 0.9658 - val_loss: 0.1091 - val_accuracy: 0.9674
Epoch 4/5
60000/60000 [==============================] - 5s 85us/sample - loss: 0.0952 - accuracy: 0.9726 - val_loss: 0.1082 - val_accuracy: 0.9658
Epoch 5/5
60000/60000 [==============================] - 4s 70us/sample - loss: 0.0788 - accuracy: 0.9775 - val_loss: 0.0947 - val_accuracy: 0.9702
<tensorflow.python.keras.callbacks.History at 0x23036b99320>

除此之外,作者通过给上述多层感知机模型添加全连接层以及改变全连接层的尺寸,并观察了这些操作对训练结果

的影响。由于此文的目的是为了提供一个多层感知机的实现示例,因此不再展开,具体代码以及实验结果可以在作

者GitHub上看到。

简易CNN实现:

model5 = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters=6, 
                           kernel_size=5, 
                           activation='relu', 
                           input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPool2D(pool_size=2, strides=2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

模型结构:

Model: "sequential_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_4 (Conv2D)            (None, 24, 24, 6)         156       
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 12, 12, 6)         0         
_________________________________________________________________
flatten_4 (Flatten)          (None, 864)               0         
_________________________________________________________________
dense_16 (Dense)             (None, 256)               221440    
_________________________________________________________________
dense_17 (Dense)             (None, 10)                2570      
=================================================================
Total params: 224,166
Trainable params: 224,166
Non-trainable params: 0
_________________________________________________________________
None

在训练模型前需要将训练数据的shape更改一下:

train_data = tf.reshape(train_data, (-1, 28, 28, 1))
test_data = tf.reshape(test_data, (-1, 28, 28, 1))

设置超参数并训练模型:

model5.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
             loss='sparse_categorical_crossentropy', 
             metrics=['accuracy'])

model5.fit(train_data, train_label, epochs=5, validation_split=0.1)

训练结果:

Train on 54000 samples, validate on 6000 samples
Epoch 1/5
54000/54000 [==============================] - 34s 622us/sample - loss: 0.2047 - accuracy: 0.9400 - val_loss: 0.0763 - val_accuracy: 0.9797
Epoch 2/5
54000/54000 [==============================] - 32s 594us/sample - loss: 0.0688 - accuracy: 0.9792 - val_loss: 0.0605 - val_accuracy: 0.9833
Epoch 3/5
54000/54000 [==============================] - 32s 600us/sample - loss: 0.0479 - accuracy: 0.9846 - val_loss: 0.0476 - val_accuracy: 0.9870
Epoch 4/5
54000/54000 [==============================] - 32s 593us/sample - loss: 0.0338 - accuracy: 0.9892 - val_loss: 0.0566 - val_accuracy: 0.9855
Epoch 5/5
54000/54000 [==============================] - 35s 649us/sample - loss: 0.0258 - accuracy: 0.9916 - val_loss: 0.0522 - val_accuracy: 0.9858
<tensorflow.python.keras.callbacks.History at 0x230380d9518>

在训练model5之前,作者使用了同样结构的model4,但是优化器选用的SGD,学习率设为0.9,在最后训练完成后

发现整个模型的准确率=0.1就像没有训练过的随机初始化的模型一样,因此作者将模型的优化器修改为Adam,并将

学习率设为0.001,即model4,训练完成后模型准确率为0.98。之后作者仅修改学习率,优化器仍然使用SGD,发现

模型训练完成后的准确率虽然没有优化器寻味Adam的版本好,但也有0.96。可以看出模型的超参数选取十分重要。

 
posted @ 2020-12-10 17:29  荒唐了年少  阅读(270)  评论(0编辑  收藏  举报