6 训练神经网络(上)——激活函数、数据预处理

训练神经网络(上)——激活函数、数据 预处理

Activateion

Sigmoid

σ ( x ) = 1 1 + e − x \sigma(x) = \frac{1}{1 + e^{-x}} σ(x)=1+ex1

  • Squashes numbers to range [0,1]
  • Historically popular

3 problems:

  1. Saturated neurons kill the gradient
  2. Sigmoid outputs are not zero-centered
  3. exp() is a bit compute expensive

tanh(x)

  • Squanshes numbers to range [-1, 1]
  • zero centered 😃
  • still kill gradients when saturated 😦

ReLU

f ( x ) = m a x ( 0 , x ) f(x) = max(0, x) f(x)=max(0,x)

  • Does not saturate 😃
  • very computationally efficient 😃
  • Converges much faster than sigmoid/tanh in practice 😃
  • Actually more biologically plausible than sigmoid 😃

problems:

  • Not zero-centered output

Leaky ReLU

f ( x ) = m a x ( 0.01 x , x ) f(x) = max(0.01x, x) f(x)=max(0.01x,x)

Exponential Linear Units(ELU)

f ( x ) = { x i f x > 0 α ( e x p ( x ) − 1 ) i f x ≤ 0 f(x) = \begin{cases} x \quad if \quad x > 0 \\ \alpha (exp(x)-1) \quad if \quad x \leq 0 \end{cases} f(x)={xifx>0α(exp(x)1)ifx0

Maxout Neuron

m a x ( w 1 T x + b 1 , w 2 T x + b 2 ) max(w_1^Tx + b_1, w_2^Tx + b_2) max(w1Tx+b1,w2Tx+b2)

  • double parameters 😦

Data Preprocessing

Preprocess the data

  • zero-centered data
  • normalized data
  • PCA
  • Whitening

Weight Normalization

  • First idea: Small random numbers
    (gaussian with zero mean and 1e-2 standard deviation)
    W = 0.01 ∗ n p . r a n d o m . r a n d n ( D , H ) W = 0.01 * np.random.randn(D, H) W=0.01np.random.randn(D,H)
    Work Okey for small networks, but problems with deeper networks
  • Xavier initialization
    W = n p . r a n d o m . r a n d n ( f a n _ i n , f a n _ o u t ) / n p . s q r t ( f a n _ i n ) W = np.random.randn(fan\_in, fan\_out) / np.sqrt(fan\_in) W=np.random.randn(fan_in,fan_out)/np.sqrt(fan_in)
posted @ 2020-08-16 15:40  JHadoop  阅读(105)  评论(0)    收藏  举报