大白话介绍softmax训练mnist+python实现

逻辑回归是2分类的,softmax可以拓展为多分类,softmax求解过程基本上与逻辑回归的求解过程类似,所以学过逻辑回归的话,softmax会很好理解,具体过程如下;

softmax的h(x)为:

softmax的代价函数为:

 

 在实际应用中,为了使算法实现更简单清楚,我们需要对代价函数做一个改动:加入权重衰减。权重衰减可以解决 softmax 回归的参数冗余所带来的数值问题,

假如权重衰减项后代价函数就变成了严格的凸函数,这样就可以保证得到唯一的解了。 此时的 Hessian矩阵变为可逆矩阵,并且因为是凸函数,梯度下降法和 L-BFGS 等算法可以保证收敛到全局最优解,此时:

再经过梯度下降法进行迭代更新即可:

下面附源码:

#利用softmax训练mnist(详细版)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import struct
#读取数据
trainimage_path="E:\\caffe\\study\\work\\train\\train-images-idx3-ubyte\\train-images.idx3-ubyte"
trainlabel_path="E:\\caffe\\study\\work\\train\\train-labels-idx1-ubyte\\train-labels.idx1-ubyte"
def getimage(filepath):#将二进制文件转换成像素特征的数据
    readfile= open(filepath, 'rb') #以二进制方式打开文件
    file= readfile.read()
    readfile.close()
    index = 0
    nummagic,numimgs,numrows,numcols=struct.unpack_from(">iiii",file,index)
    index += struct.calcsize("iiii")
    images = []
    for i in range(numimgs):
        imgval = struct.unpack_from(">784B", file, index)
        index += struct.calcsize("784B")
        imgval = list(imgval)
        for j in range(len(imgval)):
            if imgval[j] > 1:
                imgval[j] = 1
        images.append(imgval)
    return np.array(images)
def getlabel(filepath):
    readfile = open(filepath, 'rb')
    file = readfile.read()
    readfile.close()
    index = 0
    magic, numitems = struct.unpack_from(">ii", file, index)
    index += struct.calcsize("ii")
    labels = []
    for x in range(numitems):
        im = struct.unpack_from(">1B", file, index)
        index += struct.calcsize("1B")
        labels.append(im[0])
    return np.array(labels)
trainimage=getimage(trainimage_path)
trainimage=[list(i) for i in trainimage]
trainimage=[i+[1] for i in trainimage]
trainlabel=getlabel(trainlabel_path)
trainlabel=list(trainlabel)
#softmax训练
names=locals()
for i in range(10):#动态生成10个初始化theta向量
    names["theta"+str(i)]=np.ones(785)
thetalist=[list(names["theta"+str(i)]) for i in range(10)]
lmda=1 #初始化lambda需要大于0
def ethetax(thetalist,x):
    c=[]
    for i in range(len(x)):
        a = []
        for j in range(len(thetalist)):
          b= sum([m*n for m,n in zip(thetalist[j],x[i])])
          a.append(b)
        c.append(a)
    return c
allex=ethetax(thetalist,trainimage)
def f(trainlabel,trainimage,allex,thetalist,lmda):#一阶导函数
    c=[]
    for i in range(10):
        a=np.zeros(785)
        for j in range(len(trainimage)):
            if trainlabel[j]==i:
                a=a+np.mat(trainimage[j])*(1-allex[j][i]/sum(allex[j]))
            else:
                a = a + np.mat(trainimage[j]) * (-1) * (allex[j][i] / sum(allex[j]))
        a=a*(-1)/len(trainlabel)+lmda*np.mat(thetalist[i])
        c.append(a)
        c=[list(np.array(i)[0]) for i in c]
    return c
ff=f(trainlabel,trainimage,allex,thetalist,lmda)
def gd(trainlabel,trainimage,allex,thetalist,lmda,alpha,times):#梯度下降法迭代
    for i in range(times):
        ff=f(trainlabel,trainimage,allex,thetalist,lmda)
        ff=[np.mat(i)*alpha for i in ff]
        thetalist=[np.mat(i) for i in thetalist]
        theta=[theta[i]-ff[i] for i in range(len(ff))]
        theta=[list(np.array(i)[0]) for i in theta]
    return theta

  

 

posted @ 2017-08-23 15:36  澹宁  阅读(377)  评论(0)    收藏  举报