BP神经网络笔记_2

这是第二篇笔记,上一篇笔记主要写了神经网络向前传播的过程,这篇笔记将主要记录代价函数以及梯度下降

代价函数

对于逻辑回归来说,损失函数为\(J(\theta)\)

\[J(\theta) = -\frac{1}{m} \left[ \sum_{i=1}^m y^{(i)}\log h_\theta(x^{(i)})+(1-y^{(i)})\log (1-h_\theta(x^{(i)})) \right] +\frac{\lambda}{2m}\sum_{j=1}^n\theta_j^2 \]

其中,\(m\)为总的数据组数,\(x^{(i)}\)为第\(i\)组数据,可能为向量也可能为实数,\(y^{(i)}\)为数据的结果,为实数。\(n\)为每组\(x^{(i)}\)除去添加项\(x^{(i)}_0\)(注:因为我们不对偏差项\(x^{(i)}_0\)做正则化)的维数,

因为神经网络类似于逻辑回归,相当于逻辑回归的升维,所以对于神经网络来说,损失函数总体来说变化不大,我们假设layer1与layer2之间神经网络的损失函数为\(J(\Theta_1)\),则

\[J(\Theta_1) = -\frac{1}{m} \left[ \sum_{i=1}^m \sum_{k=1}^K y^{(i)}_k\log h_\theta(x^{(i)})_k+(1-y^{(i)}_k)\log (1-(h_\theta(x^{(i)}))_k) \right] +\frac{\lambda}{2m}\sum_{j=1}^n\sum_{k=1}^K\theta_k^j \]

其中\(K\)为layer2对应的除去偏差项的节点个数。对比逻辑回归的损失函数\(J(\theta)\),我们发现,\(J(\Theta_1)\)就是增加了\(\sum_{k=1}^K\)的项,这表示为对layer2的每个节点损失函数值的求和。

反向传播

为了找到让我们神经网络能够正确拟合的参数,所以我们需要让\(\frac{\partial J(\Theta_1)}{\partial \Theta_1 } = 0\),所以我们对参数进行梯度下降处理。为了求出\(\frac{\partial J(\Theta_1)}{\partial \Theta_1 }\),我们需要利用反向传播算法。

对一组输入数据\(x^{(i)}\)来说,我们定义\(\delta_i\)为layer\(i\)的误差向量,\(a^{(i)}\)为第\(i\)层激活向量,\(a^{(1)}\)\(x\),假设我们讨论的是\(n\)层神经网络

我们先利用正向传播,求出每一层的激活向量\(a^{(i)}\)

\[a^{(2)} = h_{\Theta_1}(x) = g(\Theta_1^Tx)\\ \vdots\\ a^{i+1} = h_{\Theta_i}(a) = g(\Theta_{i}^Ta^{(i)}) \]

现在我们来求每一层的误差向量\(\delta_i\)(没有\(\delta_1\),因为第一层为输入数据,我们认为原始数据不存在误差)

\[\delta_n = a^{(n)} - y^{(i)}\\ \delta_{n-1} = \Theta_{n-1}\delta_n \cdot g^{'}(a^{(n-1)})\\ \vdots\\ \delta_2 =\Theta_{2}\delta_3\cdot g^{'}(a^{(2)}) \]

得到每一层的\(\delta_i\)后,我们利用以下式子计算每一层的\(\frac{\partial J(\Theta)}{\partial \Theta }\),其中\(\lambda\)是正则化项系数,这里要注意,反向传播时隐藏层中加入的偏置项不能对之前层产生影响,所以在计算\(a_i\delta_{i+1}^T\)时,我们要将\(\delta_{i+1}^T\)中属于偏置项的误差剔除,即将向量\(\delta_{i+1}^T\)的第一行剔除,不然会造成维数不匹配的错误。

\[\Delta_i = a_i\delta_{i+1}^T\\ D_i = \frac{1}{m}\Delta_i+\lambda\Theta_i\\ \frac{\partial J(\Theta_i)}{\partial \Theta_i } = D_i \]

由此我们计算出了\(\frac{\partial J(\Theta_i)}{\partial \Theta_i }\),于是我们可以利用梯度下降法来求解,注意,这时\(\Theta_i\)的初始值不能设为0也不能全设成一样的数,具体原因这里不做解释,我们只需要把\(\Theta_i\)用正态分布或均匀分布初始化即可

\[\Theta_i = \Theta_i - \alpha D_i \]

当我们重复上述过程直到\(\Theta_i\)不再变化,我们的神经网络模型就训练完毕了。

附上我写的很垃圾的代码

import numpy as np
import matplotlib.pyplot as plt
import random as rd

def g(x):
    return 1.0/(1+np.exp(-x))

def data(list, x, y, num):
    for i in range(num):
        list.append(rd.randint(x, y))

def res(list, val, num):
    for i in range(num):
        list.append(val)

def Data(list1, list2, y, l1, r1, l2, r2, val, num):
    data(list1, l1, r1, num)
    data(list2, l2, r2, num)
    res(y,val, num)


x1 = []
x2 = []
y = []
Data(x1, x2, y, 0, 10, 0, 10, 1, 30)
Data(x1, x2, y, 0, 10, 20, 30, 0, 30)
Data(x1, x2, y, 20, 30, 0, 10, 0, 30)
Data(x1, x2, y, 20, 30, 20, 30, 1, 30)

#print(y)
'''
for i in range(len(y)):
    if y[i] == 1:
        plt.scatter(x1[i], x2[i], marker = "*")
    else:
        plt.scatter(x1[i], x2[i], marker = "o")
#plt.show()
'''
one = np.mat(np.ones(len(y)))
x1 = np.mat(x1)
x2 = np.mat(x2)
y = np.mat(y)
x = np.r_[one, x1, x2]
m = x.shape[1]
theta1 = np.mat(np.random.randn(3, 3))
theta2 = np.mat(np.random.randn(4, 1))

Delta1 = 0
Delta2 = 0

lamb = 0.01
alpha = 0.1
times = 10000
cnt = 0
lose = []
while cnt < times:
    
    layer2 = g(theta1.T*x)
    layer2 = np.r_[one, layer2]
    layer3 = g(theta2.T*layer2)

    delta3 = layer3 - y
    lose.append(abs(delta3.sum()))
    delta2 = np.multiply(theta2*delta3, np.multiply(layer2, 1-layer2))

    Delta1 = x*(delta2[1:delta2.shape[0],:]).T
    Delta2 = layer2*delta3.T

    D2 = Delta2/m + lamb*theta2
    D1 = Delta1/m + lamb*theta1

    theta2 = theta2 - alpha*D2
    theta1 = theta1 - alpha*D1

    cnt = cnt + 1
    
plt.plot(np.arange(0, len(lose)), lose)
plt.show()
def f(x, theta1, theta2):
    layer2 = g(theta1.T*x)
    layer2 = np.r_[one, layer2]
    layer3 = g(theta2.T*layer2)
    return layer3

print(f(x, theta1, theta2))
posted @ 2020-10-25 16:48  zzhASDa  阅读(101)  评论(0)    收藏  举报