CS229: Introduction to deep learning

deep learning

computationally expensive
data
algorithm

logistic regression

e.g.1 find a cat in the image

input : suppose image 64*64, input vector x: (64*64*3,1)
function: \(\hat{y} = \sigma(wx+b) = \sigma(\theta^{T}x)\)
steps:
1. initialize parameters(weights and bias)
2. find optimal w and b
  - loss function: \(L = -[ylog\hat{y}+(1-y)log(1-\hat{y})]\) (from maximum likelihood estimation)
  - gradient descent
3. use model to predict

neural = linear + activation here, it's wx+b and sigmoid function

model = architecture + parameter

e.g.2 find cat\lion\iguana in image

apply more neural at the same time, they are in the same layer and don't communicate with each other. A neural like \(a^{[1]}_{2}\) indicates that it's the second neural in the first layer

dataset should be labeled by more information.

how your label will affect the inner structure of your network.

it has good robustness because all neural in the same layer are not related.

e.g.3 +constraint: unique animal on an image

modify neural using softmax:
- \(z^{[1]}_{2}\) indicates the linear part of the second neural in the first layer
- activation is like \(\dfrac{e^{z^{[1]}_{2}}}{\Sigma^{n}_{i=1}e^{z^{[1]}_{i}}}\), where n is the number of neural in this layer
  - the sum of the output from this layer is definitely 1, and we select the biggest probability
  - softmax multi-class network
    - cross entropy loss:\(L = -\Sigma^{n}_{k=1}y_{k}log\hat{y_{k}}\)

Neural network

More neural and more layers(architecture)
- the output layer must have the same number of neurons compared with the number of classes to be for reclassification.
- input layer and hidden layer
  - hidden layer can understand complicated structure of raw data
- fully connected(different from how human design)

Propagation equation(forward)

each layer's input is from the former layer's output, after linear part and activation part, output is fed to the next layer.
be careful of the size of matrix

input batch of m examples

input: \(X = (x^{(1)},x^{(2)},\cdots,x^{(m)})\), each row is an input case
- \(m\) is the number of cases in total
- \(n_{0}\) is the number of origin features
- \(n_{i}\) is the number of neurons in the \(i\) layer
- \(X\): \((n_{0},m)\)
layer: \(Z^{[i] = w^{[i]}X + b^{[i]}}\)
- \(Z^{[i]}\) : \((n_{i},m)\)
- \(w^{[i]}\) : \((n_{i},n_{i-1})\)
- \(b\): \((n_{i},m)\) (by broadcasting, each row stays the same)
- in the first layer, the input is the origin feature matrix \(X\), afterwards it will be replaced by \(Z_{i}\)
architecture should be based on the complexity

Optimizing parameters

define loss/cost functions
\(J(\hat{y}.y) = \dfrac{1}{m}\Sigma^{m}_{i=1} L^{i}\)
- with \(L^{i} = -[y^{i}log\hat{y^{i}}+(1-y^{i})log(1-\hat{y^{i}})]\)
Backward propagation
- \(w^{[i]} = w^{[i]}-\alpha \dfrac{\part J}{\part w^{[i]}}\)
- \(b^{[i]} = b^{[i]}-\alpha \dfrac{\part J}{\part b^{[i]}}\)
- begin from \(w^{[3]}\) (close to output)
- using chain rule to propagate backwards

posted @ 2022-03-14 12:36 Phile-matology 阅读(50) 评论(0) 收藏举报

刷新页面返回顶部

Phile-matology