# 从逻辑回归开始入门深度学习

• 符号定义
• 逻辑回归LR：定义、实现、高效实现
• 浅层神经网络（2层）：实现、优化
• 深度神经网络：实现、优化、应用

## 符号定义

• （x, y）: 输入样本; x ∈ $$R^{n_x}$$, y ∈ {0, 1}
• {$$(x^{(1)}, y^{(1)}), (x^{(2)}, y^{(2)})... (x^{(m)}, y^{(m)})$$}: 训练数据集，包含m个训练样本
• [a,b,c,.....,z].T: 向量，默认情况下，向量指的是列向量
• $$m = m_{train}$$, $$m_{test}$$=#test examples
• $X \in R^{n_x * m} $$: 训练集，训练样本以**列的方式**进行堆叠，换言之，X矩阵的每一列是一个样本，而不是行； X.shape = ($$n_x$, m)
• $$Y \in R^{1*m}$$: 训练标签，标签以列的方式进行堆叠, $$Y.shape = (1,m)$$

## 逻辑回归

### 原理介绍

$\hat{y} = sigmoid(w^Tx+b)$

Sigmoid是一种非线性的S型函数，取值范围在[0, 1]，这种输出值可以别理解为概率表示。Sigmoid函数的计算公式和曲线如下。

$Sigmoid(z) = \frac{1}{1+e^{-z}}$

### Loss function

$\hat{y}^{(i)} = \sigma(w^Tx^{(i)} + b), where \ \sigma(z)=\frac{1}{1+e^{-z}}$

$L(\hat y, y) = -(ylog\hat y + (1-y)log(1-\hat y))$

$J(w, b) = \frac1{m}\sum_{i=1}^m L(\hat y, y) = -\frac1{m}\sum_{i=1}^m[y^{(i)}log\hat y^{(i)} + (1-y^{(i)})log(1-\hat y^{(i)})]$

LR损失函数可以使用最大似然估计来进行推导。

$w = w - \alpha \frac{\partial J(w, b)}{\partial w} \\b = b - \alpha \frac{\partial J(w, b)}{\partial b}$

### 计算图

$\frac{\partial J}{\partial a} = \frac{\partial J}{\partial v} * \frac{\partial v}{\partial a} = 3 * 1 = 3$

$\frac{\partial J}{\partial b} = \frac{\partial J}{\partial v} * \frac{\partial v}{\partial u} * \frac{\partial u}{\partial b}= 3 * 1 * c= 6$

$\frac{\partial J}{\partial c} = \frac{\partial J}{\partial v} * \frac{\partial v}{\partial u} * \frac{\partial u}{\partial c}= 3 * 1 * b= 9$

### LR的优化计算

#### 单个样本的计算

$\hat y = \sigma(w^Tx+b), where \ \sigma(z)=\frac1{1+e^{-z}} \\L(\hat y, y) = -[ylog\hat y + (1-y)log(1-\hat y)]$

$w_1 = w_1 - \alpha * \frac{\partial L}{\partial w_1} \\w_2 = w_2 - \alpha * \frac{\partial L}{\partial w_2}\\b = b - \alpha * \frac{\partial L}{\partial b}$

$\frac{\partial J}{\partial a} = \frac{\partial -[yloga + (1-y)log(1-a)]}{\partial a} = -[\frac{y}{a} - \frac{1-y}{1-a}]$

$\frac{\partial a}{\partial z} = \frac{\partial \frac{1}{1+e^{-z}}}{\partial z} = \frac{1}{1+e^{-z}} * \frac{e^{-z}}{1+e^{-z}} = a * (1 - a) \\\frac{\partial z}{\partial w_1} = \frac{\partial (w_1x_1+w_2x_2+b)}{\partial w_1} = x_1 \\\frac{\partial z}{\partial w_2} = \frac{\partial (w_1x_1+w_2x_2+b)}{\partial w_2} = x_2 \\\frac{\partial z}{\partial b} = \frac{\partial (w_1x_1+w_2x_2+b)}{\partial b} = 1$

$\frac{\partial J}{\partial w_1} = \frac{\partial J}{\partial a} * \frac{\partial a}{\partial z} * \frac{\partial z}{\partial w_1} = -[\frac{y}{a} - \frac{1-y}{1-a}] * a * (1 - a) * x_1 \\= (a-y)*x_1$

$\frac{\partial J}{\partial w_2} = \frac{\partial J}{\partial a} * \frac{\partial a}{\partial z} * \frac{\partial z}{\partial w_2} = -[\frac{y}{a} - \frac{1-y}{1-a}] * a * (1 - a) * x_2 \\= (a-y)*x_2$

$\frac{\partial J}{\partial b} = \frac{\partial J}{\partial a} * \frac{\partial a}{\partial z} * \frac{\partial z}{\partial b} = -[\frac{y}{a} - \frac{1-y}{1-a}] * a * (1 - a) * 1 \\= a-y$

#### m个样本的计算

$z^{(i)} = w^Tx^{(i)} + b$

$\hat y^{(i)} = a^{(i)} = \sigma(z^{(i)})$

$J(w, b) = \frac1{m}\sum_{i=1}^m L(\hat y, y) = -\frac1{m}\sum_{i=1}^m[y^{(i)}log\hat y^{(i)} + (1-y^{(i)})log(1-\hat y^{(i)})]$

$$\frac{\partial J}{\partial w_1} = \frac1{m}*\sum_{i=1}^m(a^{(i)}-y^{(i)})*x_1^{(i)}$$

$$\frac{\partial J}{\partial w_2} = \frac1{m}*\sum_{i=1}^m(a^{(i)}-y^{(i)})*x_2^{(i)}$$

$$\frac{\partial J}{\partial b} = \frac1{m}*\sum_{i=1}^m(a^{(i)}-y^{(i)})$$

J=0; dw1=0; dw2=0; db=0
for i = 1 to m:
# 前向传播计算损失函数
z(i) = w * x(i) + b
a(i) = sigmoid(z(i))
J += -[y(i)loga(i) + (1-y(i))log(1-a(i))]
# 反向传播计算导数
dz(i) = a(i) - y(i)
dw1 += dz(i)*x1(i) # x1(i)：第i个样本的第一个特征
dw2 += dz(i)*x2(i)
db += dz(1)
# 遍历完m个样本，计算梯度均值
J /= m
dw1 /= m
dw2 /= m
db /= m


w1 -= alpha * dw1
w2 -= alpha * dw2
b -= alpha * db


#### 优化

Z计算: $$z = w^Tx+b$$

for-loop形式

z = 0
for i in range(n_x):
z += w[i]*x[i]
z += b


import numpy as np
z = np.dot(w.T, x) + b


for-loop 方法

z = np.zeros((1, m))
for i in range(m):
for j in range(n_x):
z[i] += w[j]*X[j][i]


z = np.dot(w.T, X)


# w: [n_x, 1]; x: [n_x, m], b: float
Z = np.dot(w.T, X) + b # [1, m]
A = sigmoid(Z)
# 反向传播计算梯度dw, db
dZ = A -Y
dw = 1./m * X * dZ.T
db = 1./m * np.sum(dZ)
# 参数更新
w -= learning_rate * dw
b -= learning_rate * db


Whenever possible, avoid explicit for-loops.

## 浅层神经网络（2层）

### 前向传播

$$z_1^{[1]} = w_1^{[1]T}*x + b_1^{[1]},\ a_1^{[1]} = \sigma(z_1^{[1]});$$

$$z_2^{[1]} = w_2^{[1]T}*x + b_2^{[1]},\ a_2^{[1]} = \sigma(z_2^{[1]});$$

$$z_3^{[1]} = w_3^{[1]T}*x + b_3^{[1]},\ a_3^{[1]} = \sigma(z_3^{[1]});$$

$$z_4^{[1]} = w_4^{[1]T}*x + b_4^{[1]},\ a_4^{[1]} = \sigma(z_4^{[1]});$$

$$z_1^{[2]} = w_1^{[2]T}*a^{[1]} + b_1^{[2]}, \hat y=a^{[2]} = \sigma(z^{[2]})$$

$$z^{[1]} = W^{[1]}x + b^{[1]}$$

$$a^{[1]} = \sigma(z^{[1]})$$

$$z^{[2]} = W^{[2]}a^{[1]} + b^{[2]}$$

$$a^{[2]}=\sigma(z^{[2]})$$

for i = 1 to m:
z[1](i) = W[1]x(i) + b[1]
a[1](i) = sigma(z[1](i))
z[2](i) = W[2]a[1](i) + b[2]
a[2](i) = sigma(z[2](i))


Z[1] = W[1]X + b[1]
A[1] = sigma(Z[1])
Z[2] = W[2]A[1] + b[2]
A[2] = sigma(Z[2])


### 反向传播

$$dz^{[2]} = a^{[2]} - y$$

$$dW^{[2]} = dz^{[2]} * \frac{\partial dz^{[2]}}{\partial W^{[2]}} = dz^{[2]} * a^{[1]T}$$

$db^{[2]} = dz^{[2]} * \frac{\partial dz^{[2]}}{\partial b^{[2]}} = dz^{[2]}$

$$dz^{[1]} = dz^{[2]} * \frac{\partial dz^{[2]}}{\partial a^{[1]}} * \frac{\partial a^{[1]}}{\partial z^{[1]}} = W^{[2]T}*dz^{[2]} * g^{[1]^{'}}(z^{[1]})$$

$$dW^{[1]} = dz^{[1]} * \frac{\partial dz^{[1]}}{\partial W^{[1]}} = dz^{[1]} * x^T$$

$$db^{[1]} = dz^{[1]} * \frac{\partial dz^{[1]}}{\partial b^{[1]}} = dz^{[1]} * 1 = dz^{[1]}$$

## 深层神经网络

Input: $$da^{[l]}$$

Output: $$da^{[l-1]}, dW^{[l]}, db^{[l]}$$

$$z^{[l]} = W^{[l]}*a^{[l-1]} + b^{[l]}$$

$$a^{[l]} = g(z^{[l]})$$

$$dz^{[l]} = da^{[l]} * g^{[l]'}(z^{[l]})$$

$$da^{[l-1]} = W^{[l]T} * dz^{[l]}$$

$$dW^{[l-1]} = dz^{[l]} * a^{[l-1]}$$

$$db^{[l-1]} = dz^{[l]}$$

## 总结

posted @ 2020-05-08 22:58  April15  阅读(550)  评论(0编辑  收藏  举报