day 9 mnist数据集

Mnist数据集

完整代码

import numpy as np
import tensorflow.compat.v1 as tf
import matplotlib.pyplot as plt
from tensorflow.examples.tutorials.mnist import input_data
tf.compat.v1.disable_eager_execution()
tf.disable_v2_behavior()
print ("packs loaded")
print ("Download and Extract MNIST dataset")
mnist=input_data.read_data_sets("C:/Users/chenqi/Desktop/data/mnist",one_hot=True)
print (" tpye of 'mnist' is %s" % (type(mnist)))
print (" number of trian data is %d" % (mnist.train.num_examples))
print (" number of test data is %d" % (mnist.test.num_examples))
# What does the data of MNIST look like? 
print ("What does the data of MNIST look like?")
trainimg   = mnist.train.images
trainlabel = mnist.train.labels
testimg    = mnist.test.images
testlabel  = mnist.test.labels
print (" type of 'trainimg' is %s"    % (type(trainimg)))
print (" type of 'trainlabel' is %s"  % (type(trainlabel)))
print (" type of 'testimg' is %s"     % (type(testimg)))
print (" type of 'testlabel' is %s"   % (type(testlabel)))
print (" shape of 'trainimg' is %s"   % (trainimg.shape,))
print (" shape of 'trainlabel' is %s" % (trainlabel.shape,))
print (" shape of 'testimg' is %s"    % (testimg.shape,))
print (" shape of 'testlabel' is %s"  % (testlabel.shape,))
# How does the training data look like?
print ("How does the training data look like?")
nsample = 5
randidx = np.random.randint(trainimg.shape[0], size=nsample)

for i in randidx:
    curr_img   = np.reshape(trainimg[i, :], (28, 28)) # 28 by 28 matrix 
    curr_label = np.argmax(trainlabel[i, :] ) # Label
    plt.matshow(curr_img, cmap=plt.get_cmap('gray'))
    plt.title("" + str(i) + "th Training Data " 
              + "Label is " + str(curr_label))
    print ("" + str(i) + "th Training Data " 
           + "Label is " + str(curr_label))
    plt.show()
# Batch Learning? 
print ("Batch Learning? ")
batch_size = 100
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
print ("type of 'batch_xs' is %s" % (type(batch_xs)))
print ("type of 'batch_ys' is %s" % (type(batch_ys)))
print ("shape of 'batch_xs' is %s" % (batch_xs.shape,))
print ("shape of 'batch_ys' is %s" % (batch_ys.shape,))

具体分析

  1. 加载数据集

    import numpy as np
    import tensorflow.compat.v1 as tf
    import matplotlib.pyplot as plt
    from tensorflow.examples.tutorials.mnist import input_data
    tf.compat.v1.disable_eager_execution()
    tf.disable_v2_behavior()
    print ("packs loaded")
    print ("Download and Extract MNIST dataset")
    mnist=input_data.read_data_sets("C:/Users/chenqi/Desktop/data/mnist",one_hot=True)
    print (" tpye of 'mnist' is %s" % (type(mnist)))
    print (" number of trian data is %d" % (mnist.train.num_examples))
    print (" number of test data is %d" % (mnist.test.num_examples))
    

    image-20210305142110153

    把mnist数据集放在data文件夹下,编码格式是0、1编码的

    数据集分为训练数据集和测试数据集

    训练数据集有55000个样本、测试数据集有10000个样本

  2. 数据集的划分规格

    # What does the data of MNIST look like? 
    print ("What does the data of MNIST look like?")
    trainimg   = mnist.train.images
    trainlabel = mnist.train.labels
    testimg    = mnist.test.images
    testlabel  = mnist.test.labels
    print
    print (" type of 'trainimg' is %s"    % (type(trainimg)))
    print (" type of 'trainlabel' is %s"  % (type(trainlabel)))
    print (" type of 'testimg' is %s"     % (type(testimg)))
    print (" type of 'testlabel' is %s"   % (type(testlabel)))
    print (" shape of 'trainimg' is %s"   % (trainimg.shape,))
    print (" shape of 'trainlabel' is %s" % (trainlabel.shape,))
    print (" shape of 'testimg' is %s"    % (testimg.shape,))
    print (" shape of 'testlabel' is %s"  % (testlabel.shape,))
    

    image-20210305143903435

    每个样本包含图片数据和标签数据

    每个图片数据是由784个像素点组成即28x28规格

    每个标签数据是由10个数字组成,总共有10类标签表示0~9这10个数字

    标签的形式 :[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]表示数字7

    前面也说过编码格式是0、1编码,所以在第七个数字是1其余都是0

    举例:

    print (trainlabel[254])
    curr_img   = np.reshape(trainimg[254, :], (28, 28)) # 28 by 28 matrix 
    curr_label = np.argmax(trainlabel[254, :] ) # Label
    plt.matshow(curr_img, cmap=plt.get_cmap('gray'))
    plt.title("" + str(254) + "th Training Data " 
                  + "Label is " + str(curr_label))
    print ("" + str(254) + "th Training Data " 
               + "Label is " + str(curr_label))
    plt.show()
    

    image-20210305150105372

    1. 样本图像显示

      # How does the training data look like?
      print ("How does the training data look like?")
      nsample = 5
      randidx = np.random.randint(trainimg.shape[0], size=nsample)
      
      for i in randidx:
          curr_img   = np.reshape(trainimg[i, :], (28, 28)) # 28 by 28 matrix 
          curr_label = np.argmax(trainlabel[i, :] ) # Label
          plt.matshow(curr_img, cmap=plt.get_cmap('gray'))
          plt.title("" + str(i) + "th Training Data " 
                    + "Label is " + str(curr_label))
          print ("" + str(i) + "th Training Data " 
                 + "Label is " + str(curr_label))
          plt.show()
      

      image-20210305150455326

      image-20210305150505477

      随机在55000个样本中抽取5个进行展示

4.MNIST提供next_batch()方法用于批量读取数据集,例如上面批量读取10个对应的images与labels数据并分别返回。该方法会按顺序一直往后读取,直到结束后会自动打乱数据,重新继续读取

5.在打开mnist数据集时,第二个参数设置one_hot,表示采用独热编码方式打开。独热编码是一种稀疏向量,其中一个元素为1,其他元素均为0,常用于表示有限个可能的组合情况。例如数字6的独热编码为第7个分量为1,其他为0的数组。可以通过np.argmax()函数返回数组最大值的下标,即独热编码表示的实际数字。通过独热编码可以将离散特征的某个取值对应欧氏空间的某个点,有利于机器学习中特征之间的距离计算

posted @ 2021-01-14 23:09  晨起  阅读(59)  评论(0编辑  收藏  举报