WuEnda Lesson1
Week 1 Introduction
Supervised Learning 有监督学习
需要Input(x) 和 output(y)
- image data CNN
- one-dimensional sequence data RNN
Structed Data(database table) Unstructed Data(audio, image, text)
forward propagation step 正向传播步骤
backward propagation step 反向传播步骤
Week 2 Basics of Neural Network
Binary Classification 二分分类
feature vector 特征向量\(n_x\) : \(64\times64\times3 \to 1\times12288\)
可以理解为,输入变量有12288个!
Notation
- (x, y) 一个单独的样本
- training set : \(\{(x^{(1)}, y^{(1)}), (x^{(2)}, y^{(2)}), ... , (x^{(m)}, y^{(m)})\}\)
- \(m_{train} , m_{test}\) 训练集和测试的样本数
- \(X^{n_x\times m}\) 表示所有样本的Input: \(x^{(1)}, x^{(2)}, ... , x^{(m)}\)所组成的矩阵,n行(输入变量维数)m列(样本大小) X.shape = (n, m)
- \(Y^{1 \times m}\)表示所有样本的Output: \(y^{(1)}, y^{(2)}, ... , y^{(m)}\)所组成的矩阵 Y.shape = (1, m)
Logicstic Regression
应用于Output为0/1的二分分类问题的回归算法:
Given x(1, 12288), want \(\hat{y} = P(y=1|x)\)
Parameters of Logistic Regression: \(w^{n_x}, b\)
Output:
-
Linear Regression: \(\hat{y} = w^Tx + b\)
-
为了令output在0~1之间,Logistic Regression: \(\hat{y} = sigmoid(w^Tx + b)\) 。其中,
\[Sigmoid(z) = \sigma(z) = \frac{1}{1+e^{-z}} \]
Logistic Regression cost function
loss(error) function 损失函数用于衡量算法对单个样本的准确程度
- \(L(\hat{y}, y) = \frac12(\hat{y} - y)^2\) 在Logistic Regression中不使用,因为会令优化问题变成非凸的(non-convex)
\[L(\hat{y}, y) = -(y\log{\hat{y} + (1-y)log(1-\hat{y})})
\]
Logistic Regression算法使用这个Loss Function,由于y = 1或0:
- 当y = 1时,\(L = -\log{\hat{y}}\) , \(\hat{y}\) 在0~1之间,\(\hat{y}\)越大,\(L\)越小。
- 当y = 0时,\(L = -\log{(1 - \hat{y})}\), \(\hat{y}\)越小,\(L\)越小。
cost function 成本函数用于衡量算法对整个样本集的准确程度
\[J(w, b) = \frac1m\sum^{m}_{i=1}L(\hat{y}^{(i)}, y^{(i)}) = -\frac1m\sum^m_{i=1}y^{(i)}\log{\hat{y}^{(i)}} + (1-y^{(i)})\log{(1-\hat{y}^{(i)})}
\]
Gradient Descent 梯度下降法
Repeat: {
\(w := w - \alpha\frac{\partial{J(w, b)}}{\partial{w}}\)
}
\(\alpha\)表示学习率。重复上述步骤直到w收敛。
Derivatives 导数
略。

浙公网安备 33010602011771号