6 训练神经网络(上)——激活函数、数据预处理

训练神经网络(上)——激活函数、数据预处理

文章目录

训练神经网络(上)——激活函数、数据预处理

Activateion

Sigmoid

$\sigma(x) = \frac{1}{1 + e^{-x}}$

Squashes numbers to range [0,1]
Historically popular

3 problems:

Saturated neurons kill the gradient
Sigmoid outputs are not zero-centered
exp() is a bit compute expensive

tanh(x)

Squanshes numbers to range [-1, 1]
zero centered 😃
still kill gradients when saturated 😦

ReLU

$f (x) = m a x (0, x)$

Does not saturate 😃
very computationally efficient 😃
Converges much faster than sigmoid/tanh in practice 😃
Actually more biologically plausible than sigmoid 😃

problems:

Not zero-centered output

Leaky ReLU

$f (x) = m a x (0.01 x, x)$

Exponential Linear Units(ELU)

$\begin{cases} x \quad if \quad x > 0 \\ \alpha (exp(x)-1) \quad if \quad x \leq 0 \end{cases}$

Maxout Neuron

$max(w_1^Tx + b_1, w_2^Tx + b_2)$

double parameters 😦

Data Preprocessing

Preprocess the data

zero-centered data
normalized data
PCA
Whitening

Weight Normalization

First idea: Small random numbers
(gaussian with zero mean and 1e-2 standard deviation)
$W = 0.01 * n p . r a n d o m . r a n d n (D, H)$
Work Okey for small networks, but problems with deeper networks
Xavier initialization
$W = np.random.randn(fan\_in, fan\_out) / np.sqrt(fan\_in)$

posted @ 2020-08-16 15:40 JHadoop 阅读(105) 评论(0) 收藏举报

刷新页面返回顶部