编程实践-Week4(1)

  学习进行到了第一个课程的Week4,本周的主要内容是在上一周的浅层神经网络的基础上进一步介绍深层的神经网络。在整个神经网络的方法论中,我们一直遵守的是如下过程:

  1.定义神经网络的结构

  2.初始化参数

  3.循环执行以下操作:(1)前向传播 (2)计算损失函数 (3)反向传播得到梯度 (4)更新参数(梯度下降)

  无论是最基本的逻辑回归函数,浅层神经网络到现在的深层神经网络,研究设计整个神经网络的过程是相同的。因此在上周浅层神经网络的基础上,我们可以得到深层神经网络的前向传播输入输出如下:

  同样的,反向传播的过程如下:

  在有了以上的两个公式后,我们就可以开始实现深层神经网络。

  首先依然是一些必须的依赖包的导入:

import numpy as np
import h5py
import matplotlib.pyplot as plt
from testCases_v2 import *
from dnn_utils_v2 import sigmoid, sigmoid_backward, relu, relu_backward

%matplotlib inline
plt.rcParams['figure.figsize'] = (5.0, 4.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

%load_ext autoreload
%autoreload 2

np.random.seed(1)

  用一张流程图可以描述整个L层神经网络中前向传播和反向传播的全过程(注意:每个前向传播都有一个对应的反向传播,这也是为什么我们使用Cache来缓存数据的原因。每一步cache的数据在反向传播时都能用得到):

  1. 初始化

  我们编写两个初始化函数分别对两层的神经网络和L层的神经网络进行初始化。

  (1) 双层神经网络

  在双层的神经网络中,整个模型的结构为: Linear -> Relu -> Linear -> Sigmoid。

  首先对参数W,b进行初始化。

  

def initialize_parameters(n_x, n_h, n_y):
    """
    Argument:
    n_x -- size of the input layer
    n_h -- size of the hidden layer
    n_y -- size of the output layer
    
    Returns:
    parameters -- python dictionary containing your parameters:
                    W1 -- weight matrix of shape (n_h, n_x)
                    b1 -- bias vector of shape (n_h, 1)
                    W2 -- weight matrix of shape (n_y, n_h)
                    b2 -- bias vector of shape (n_y, 1)
    """
    
    np.random.seed(1)
    
    W1 = np.random.randn(n_h, n_x) * 0.01
    b1 = np.zeros((n_h, 1),dtype = float)
    W2 = np.random.randn(n_y, n_h) * 0.01
    b2 = np.zeros((n_y, 1),dtype = float)
    
    assert(W1.shape == (n_h, n_x))
    assert(b1.shape == (n_h, 1))
    assert(W2.shape == (n_y, n_h))
    assert(b2.shape == (n_y, 1))
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters    

parameters = initialize_parameters(3,2,1)

  (2) L层的神经网络

  在多层的神经网络中,对参数进行初始化时,我们应更加注意W,b的维数,以输入的图像为(64*64*3,209)为例,其中m=209是样本个数,相应的参数的维数为:

  因此L层神经网络初始化的代码如下:

def initialize_parameters_deep(layer_dims):
    """
    Arguments:
    layer_dims -- python array (list) containing the dimensions of each layer in our network
    
    Returns:
    parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
                    Wl -- weight matrix of shape (layer_dims[l], layer_dims[l-1])
                    bl -- bias vector of shape (layer_dims[l], 1)
    """
    
    np.random.seed(3)
    parameters = {}
    L = len(layer_dims)            # number of layers in the network

    for l in range(1, L):
       
        parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1]) * 0.01    * 注意是L-1并不是L,这是由W,b的组成决定的
        parameters['b' + str(l)] = np.zeros((layer_dims[l],1), dtype = float
        
        assert(parameters['W' + str(l)].shape == (layer_dims[l], layer_dims[l-1]))
        assert(parameters['b' + str(l)].shape == (layer_dims[l], 1))

        
    return parameters

  parameters = initialize_parameters_deep([5,4,3])

   完成神经网络的初始化后,接下来进行前向传播的过程。

  在L层的神经网络中,整个神经网络的前向传播过程为:( L-1 ) * [ Linear -> Relu ]  -> Linear -> Sigmoid 。

  在使用向量化后,前向传播中的Linear的方程可以写为: ,其中

  则实现该方程的代码如下:

  

def linear_forward(A, W, b):
    """
    Implement the linear part of a layer's forward propagation.

    Arguments:
    A -- activations from previous layer (or input data): (size of previous layer, number of examples)
    W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
    b -- bias vector, numpy array of shape (size of the current layer, 1)

    Returns:
    Z -- the input of the activation function, also called pre-activation parameter 
    cache -- a python dictionary containing "A", "W" and "b" ; stored for computing the backward pass efficiently
    """
 
    Z = np.dot(W, A) + b            # np.dot是矩阵相乘,np.multiply是对应元素相乘,等同于*的作用
        
    assert(Z.shape == (W.shape[0], A.shape[1]))
    cache = (A, W, b)
    
    return Z, cache

A, W, b = linear_forward_test_case()

Z, linear_cache = linear_forward(A, W, b)

 

  在本实验中,从Linear->Activation的前向传播需要两个函数:Relu和Sigmoid。为了简化代码,我们预先写好 Sigmoid 和 Relu 的实现,并直接使用 Sigmoid(Z) 和 Relu(Z) 来使用两个激活函数。

  两个激活函数的实现如下:  

def sigmoid(Z):
    """
    Implements the sigmoid activation in numpy
    
    Arguments:
    Z -- numpy array of any shape
    
    Returns:
    A -- output of sigmoid(z), same shape as Z
    cache -- returns Z as well, useful during backpropagation
    """
    
    A = 1/(1+np.exp(-Z))
    cache = Z
    
    return A, cache

def relu(Z):
    """
    Implement the RELU function.

    Arguments:
    Z -- Output of the linear layer, of any shape

    Returns:
    A -- Post-activation parameter, of the same shape as Z
    cache -- a python dictionary containing "A" ; stored for computing the backward pass efficiently
    """
    
    A = np.maximum(0,Z)
    
    assert(A.shape == Z.shape)
    
    cache = Z 
    return A, cache

 

  有了两个激活函数的实现,我们可以很容易的写出 Linear -> Relu 或 Linear -> Sigmoid 的代码:

def linear_activation_forward(A_prev, W, b, activation):
    """
    Implement the forward propagation for the LINEAR->ACTIVATION layer

    Arguments:
    A_prev -- activations from previous layer (or input data): (size of previous layer, number of examples)
    W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
    b -- bias vector, numpy array of shape (size of the current layer, 1)
    activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"

    Returns:
    A -- the output of the activation function, also called the post-activation value 
    cache -- a python dictionary containing "linear_cache" and "activation_cache";
             stored for computing the backward pass efficiently
    """
    
    if activation == "sigmoid":
        # Inputs: "A_prev, W, b". Outputs: "A, activation_cache".
        ### START CODE HERE ### (≈ 2 lines of code)
        Z, linear_cache = linear_forward(A_prev, W, b)
        A, activation_cache = sigmoid(Z)
        ### END CODE HERE ###
    
    elif activation == "relu":
        # Inputs: "A_prev, W, b". Outputs: "A, activation_cache".
        ### START CODE HERE ### (≈ 2 lines of code)
        Z, linear_cache = linear_forward(A_prev, W, b)
        A, activation_cache = relu(Z)
        ### END CODE HERE ###
    
    assert (A.shape == (W.shape[0], A_prev.shape[1]))     #时刻记住上面表中 A,W,b的维度
    cache = (linear_cache, activation_cache)

    return A, cache

A_prev, W, b = linear_activation_forward_test_case()

A, linear_activation_cache = linear_activation_forward(A_prev, W, b, activation = "sigmoid")

A, linear_activation_cache = linear_activation_forward(A_prev, W, b, activation = "relu")

 

   在L层神经网络中实现前向传播即实现( L-1 ) * [ Linear -> Relu ]  -> Linear -> Sigmoid 的全过程的代码如下:

  

def L_model_forward(X, parameters):
    """
    Implement forward propagation for the [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID computation
    
    Arguments:
    X -- data, numpy array of shape (input size, number of examples)
    parameters -- output of initialize_parameters_deep()
    
    Returns:
    AL -- last post-activation value
    caches -- list of caches containing:
                every cache of linear_relu_forward() (there are L-1 of them, indexed from 0 to L-2)
                the cache of linear_sigmoid_forward() (there is one, indexed L-1)
cache和数组一样,所以下标都是从0开始,到L-1.
""" caches = [] A = X L = len(parameters) // 2 # number of layers in the neural network
# //是整除运算符
# Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list. for l in range(1, L): A_prev = A ### START CODE HERE ### (≈ 2 lines of code) A, cache = linear_activation_forward(A_prev, parameters['W' + str(l)], parameters['b' + str(l)], activation = "relu") caches.append(cache) ### END CODE HERE ### # Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list. ### START CODE HERE ### (≈ 2 lines of code) AL, cache = linear_activation_forward(A, parameters['W' + str(L)], parameters['b' + str(L)], activation = "sigmoid") caches.append(cache) ### END CODE HERE ### assert(AL.shape == (1,X.shape[1])) return AL, caches X, parameters = L_model_forward_test_case() AL, caches = L_model_forward(X, parameters)

 

  完成前向传播的计算后,我们计算相应的cost function来进行后续的反向传播(计算cost 是根据AL和 Y ,而不是预测值,还没预测呢!)

  再次重复一下L层神经网络前向传播和反向传播的计算过程:

  与前向传播类似,反向传播也可以分为以下三步:

  (1) Linear 反向传播

  (2) Linear -> Activation 反向传播

  (3) Linear -> Relu * ( L-1 ) -> Linear -> Sigmoid 整个模型

(1) Linaer 反向

  由上述实验可以知道,Linear正向传播方程为,则反向传播需要计算的内容如下:

                                                                                                                                                   

  

def linear_backward(dZ, cache):
    """
    Implement the linear portion of backward propagation for a single layer (layer l)

    Arguments:
    dZ -- Gradient of the cost with respect to the linear output (of current layer l)
    cache -- tuple of values (A_prev, W, b) coming from the forward propagation in the current layer

    Returns:
    dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
    dW -- Gradient of the cost with respect to W (current layer l), same shape as W
    db -- Gradient of the cost with respect to b (current layer l), same shape as b
    """
    A_prev, W, b = cache
    m = A_prev.shape[1]

    dW = 1/m * np.dot(dZ, A_prev.T)
    db = 1/m * np.sum(dZ, axis=1, keepdims=True) 
    dA_prev = np.dot(W.T, dZ)
    
    assert (dA_prev.shape == A_prev.shape)
    assert (dW.shape == W.shape)
    assert (db.shape == b.shape)
    
    return dA_prev, dW, db

dZ, linear_cache = linear_backward_test_case()

dA_prev, dW, db = linear_backward(dZ, linear_cache)

 

(2) Linear -> Acitvation 反向传播

  本实验中的Acitvation函数包括 Relu 和Sigmoid 函数,因此预先写好两个激活函数的反向传播函数方便下面的使用:

def relu_backward(dA, cache):
    """
    Implement the backward propagation for a single RELU unit.

    Arguments:
    dA -- post-activation gradient, of any shape
    cache -- 'Z' where we store for computing backward propagation efficiently

    Returns:
    dZ -- Gradient of the cost with respect to Z
    """
    
    Z = cache
    dZ = np.array(dA, copy=True) # just converting dz to a correct object.
    
    # When z <= 0, you should set dz to 0 as well. 
    dZ[Z <= 0] = 0
    
    assert (dZ.shape == Z.shape)
    
    return dZ

def sigmoid_backward(dA, cache):
    """
    Implement the backward propagation for a single SIGMOID unit.

    Arguments:
    dA -- post-activation gradient, of any shape
    cache -- 'Z' where we store for computing backward propagation efficiently

    Returns:
    dZ -- Gradient of the cost with respect to Z
    """
    
    Z = cache
    
    s = 1/(1+np.exp(-Z))
    dZ = dA * s * (1-s)
    
    assert (dZ.shape == Z.shape)
    
    return dZ                               

   则 Linear -> Acitvation 反向传播的代码为:

def linear_activation_backward(dA, cache, activation):
    """
    Implement the backward propagation for the LINEAR->ACTIVATION layer.
    
    Arguments:
    dA -- post-activation gradient for current layer l 
    cache -- tuple of values (linear_cache, activation_cache) we store for computing backward propagation efficiently
    activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"
    
    Returns:
    dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
    dW -- Gradient of the cost with respect to W (current layer l), same shape as W
    db -- Gradient of the cost with respect to b (current layer l), same shape as b
    """
    linear_cache, activation_cache = cache
    
    if activation == "relu":
           dZ = relu_backward(dA, activation_cache)
        dA_prev, dW, db = linear_backward(dZ, linear_cache)
        
    elif activation == "sigmoid":
        dZ = sigmoid_backward(dA, activation_cache)
        dA_prev, dW, db = linear_backward(dZ, linear_cache)
    
    return dA_prev, dW, db

AL, linear_activation_cache = linear_activation_backward_test_case()

dA_prev, dW, db = linear_activation_backward(AL, linear_activation_cache,  "sigmoid")

dA_prev, dW, db = linear_activation_backward(AL, linear_activation_cache,  "relu")

 

 (3)整个模型的反向传播

  在之前的前向传播过程中,每次迭代过程中我们都将(A, W, B, Z)缓存在Cache中,因为在反向传播中需要每层的这些数值进行计算梯度。在前向传播中的开始是A0即X,而在反向传播中我们的开始是AL即最后一层Sigmoid的输出,因此dAL为:

则整个L层神经网络的反向传播过程为:

def L_model_backward(AL, Y, caches):
    """
    Implement the backward propagation for the [LINEAR->RELU] * (L-1) -> LINEAR -> SIGMOID group
    
    Arguments:
    AL -- probability vector, output of the forward propagation (L_model_forward())
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat)
    caches -- list of caches containing:
                every cache of linear_activation_forward() with "relu" (it's caches[l], for l in range(L-1) i.e l = 0...L-2)
                the cache of linear_activation_forward() with "sigmoid" (it's caches[L-1])
    
    Returns:
    grads -- A dictionary with the gradients
             grads["dA" + str(l)] = ... 
             grads["dW" + str(l)] = ...
             grads["db" + str(l)] = ... 
    """
    grads = {}
    L = len(caches) # the number of layers
    m = AL.shape[1]
    Y = Y.reshape(AL.shape) # after this line, Y is the same shape as AL
    
    # Initializing the backpropagation
    dAL =  - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))
      
    # Lth layer (SIGMOID -> LINEAR) gradients. Inputs: "AL, Y, caches". Outputs: "grads["dAL"], grads["dWL"], grads["dbL"]
    current_cache = caches[L-1] #注意下标
    grads["dA" + str(L)], grads["dW" + str(L)], grads["db" + str(L)] = linear_activation_backward(Y, current_cache,  "sigmoid")
      
    for l in reversed(range(L-1)):
        # lth layer: (RELU -> LINEAR) gradients.
        # Inputs: "grads["dA" + str(l + 2)], caches". Outputs: "grads["dA" + str(l + 1)] , grads["dW" + str(l + 1)] , grads["db" + str(l + 1)] 
        current_cache = caches[l]  #注意下标
        dA_prev_temp, dW_temp, db_temp = linear_activation_backward(grads["dA" + str(l + 2)], current_cache,  "relu")
        grads["dA" + str(l + 1)] = dA_prev_temp
        grads["dW" + str(l + 1)] = dW_temp
        grads["db" + str(l + 1)] = db_temp
   
return grads AL, Y_assess, caches = L_model_backward_test_case() grads = L_model_backward(AL, Y_assess, caches)

 

  更新参数的步骤在此省略。

  总结:通过对L层神经网络的构建,可以帮助我们更好的理解神经网络中前向传播和反向传播的过程,了解神经网络中应该存储的数据和进行的计算,并用代码进行实现。

 

 

  

  

 

posted @ 2017-09-17 14:17  BlueBluelueluelue  阅读(344)  评论(0)    收藏  举报