day 9 mnist数据集
Mnist数据集
完整代码
import numpy as np
import tensorflow.compat.v1 as tf
import matplotlib.pyplot as plt
from tensorflow.examples.tutorials.mnist import input_data
tf.compat.v1.disable_eager_execution()
tf.disable_v2_behavior()
print ("packs loaded")
print ("Download and Extract MNIST dataset")
mnist=input_data.read_data_sets("C:/Users/chenqi/Desktop/data/mnist",one_hot=True)
print (" tpye of 'mnist' is %s" % (type(mnist)))
print (" number of trian data is %d" % (mnist.train.num_examples))
print (" number of test data is %d" % (mnist.test.num_examples))
# What does the data of MNIST look like?
print ("What does the data of MNIST look like?")
trainimg = mnist.train.images
trainlabel = mnist.train.labels
testimg = mnist.test.images
testlabel = mnist.test.labels
print (" type of 'trainimg' is %s" % (type(trainimg)))
print (" type of 'trainlabel' is %s" % (type(trainlabel)))
print (" type of 'testimg' is %s" % (type(testimg)))
print (" type of 'testlabel' is %s" % (type(testlabel)))
print (" shape of 'trainimg' is %s" % (trainimg.shape,))
print (" shape of 'trainlabel' is %s" % (trainlabel.shape,))
print (" shape of 'testimg' is %s" % (testimg.shape,))
print (" shape of 'testlabel' is %s" % (testlabel.shape,))
# How does the training data look like?
print ("How does the training data look like?")
nsample = 5
randidx = np.random.randint(trainimg.shape[0], size=nsample)
for i in randidx:
curr_img = np.reshape(trainimg[i, :], (28, 28)) # 28 by 28 matrix
curr_label = np.argmax(trainlabel[i, :] ) # Label
plt.matshow(curr_img, cmap=plt.get_cmap('gray'))
plt.title("" + str(i) + "th Training Data "
+ "Label is " + str(curr_label))
print ("" + str(i) + "th Training Data "
+ "Label is " + str(curr_label))
plt.show()
# Batch Learning?
print ("Batch Learning? ")
batch_size = 100
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
print ("type of 'batch_xs' is %s" % (type(batch_xs)))
print ("type of 'batch_ys' is %s" % (type(batch_ys)))
print ("shape of 'batch_xs' is %s" % (batch_xs.shape,))
print ("shape of 'batch_ys' is %s" % (batch_ys.shape,))
具体分析
-
加载数据集
import numpy as np import tensorflow.compat.v1 as tf import matplotlib.pyplot as plt from tensorflow.examples.tutorials.mnist import input_data tf.compat.v1.disable_eager_execution() tf.disable_v2_behavior() print ("packs loaded") print ("Download and Extract MNIST dataset") mnist=input_data.read_data_sets("C:/Users/chenqi/Desktop/data/mnist",one_hot=True) print (" tpye of 'mnist' is %s" % (type(mnist))) print (" number of trian data is %d" % (mnist.train.num_examples)) print (" number of test data is %d" % (mnist.test.num_examples))
把mnist数据集放在data文件夹下,编码格式是0、1编码的
数据集分为训练数据集和测试数据集
训练数据集有55000个样本、测试数据集有10000个样本
-
数据集的划分规格
# What does the data of MNIST look like? print ("What does the data of MNIST look like?") trainimg = mnist.train.images trainlabel = mnist.train.labels testimg = mnist.test.images testlabel = mnist.test.labels print print (" type of 'trainimg' is %s" % (type(trainimg))) print (" type of 'trainlabel' is %s" % (type(trainlabel))) print (" type of 'testimg' is %s" % (type(testimg))) print (" type of 'testlabel' is %s" % (type(testlabel))) print (" shape of 'trainimg' is %s" % (trainimg.shape,)) print (" shape of 'trainlabel' is %s" % (trainlabel.shape,)) print (" shape of 'testimg' is %s" % (testimg.shape,)) print (" shape of 'testlabel' is %s" % (testlabel.shape,))
每个样本包含图片数据和标签数据
每个图片数据是由784个像素点组成即28x28规格
每个标签数据是由10个数字组成,总共有10类标签表示0~9这10个数字
标签的形式 :[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]表示数字7
前面也说过编码格式是0、1编码,所以在第七个数字是1其余都是0
举例:
print (trainlabel[254]) curr_img = np.reshape(trainimg[254, :], (28, 28)) # 28 by 28 matrix curr_label = np.argmax(trainlabel[254, :] ) # Label plt.matshow(curr_img, cmap=plt.get_cmap('gray')) plt.title("" + str(254) + "th Training Data " + "Label is " + str(curr_label)) print ("" + str(254) + "th Training Data " + "Label is " + str(curr_label)) plt.show()
-
样本图像显示
# How does the training data look like? print ("How does the training data look like?") nsample = 5 randidx = np.random.randint(trainimg.shape[0], size=nsample) for i in randidx: curr_img = np.reshape(trainimg[i, :], (28, 28)) # 28 by 28 matrix curr_label = np.argmax(trainlabel[i, :] ) # Label plt.matshow(curr_img, cmap=plt.get_cmap('gray')) plt.title("" + str(i) + "th Training Data " + "Label is " + str(curr_label)) print ("" + str(i) + "th Training Data " + "Label is " + str(curr_label)) plt.show()
随机在55000个样本中抽取5个进行展示
-
4.MNIST提供next_batch()方法用于批量读取数据集,例如上面批量读取10个对应的images与labels数据并分别返回。该方法会按顺序一直往后读取,直到结束后会自动打乱数据,重新继续读取
5.在打开mnist数据集时,第二个参数设置one_hot,表示采用独热编码方式打开。独热编码是一种稀疏向量,其中一个元素为1,其他元素均为0,常用于表示有限个可能的组合情况。例如数字6的独热编码为第7个分量为1,其他为0的数组。可以通过np.argmax()函数返回数组最大值的下标,即独热编码表示的实际数字。通过独热编码可以将离散特征的某个取值对应欧氏空间的某个点,有利于机器学习中特征之间的距离计算