coursera Deeplearning_1

Deeplearning在coursera总共有5章节,分别如下

1neural networks and deep learning

1.1introduction to deep learning

ReLU function: rectified linear unite 修正线性单元

1.1.1supervised learning with neural networks

some applications and their networks

最后一行autonomous driving是custom and highbrit neural network architecture

structured data: based on the database or list

1.2logistic regression as a neural network

1.2.1logistic regression

在引入神经网络之前,首先搭建模型,建立logistic regression模型

output: yhat = sigma * (wT*x + b) = sigma(z) =1/(1+e^(-z))

1.2.2the algorithm of Gradient descent on m examples

1.3python and vectorization

the difference between non-vectorization and vectorization

1.3.1jupyter代码练习之vectorization demo

  • numpy求数组
import numpy as np
a = np.array([1,2,3,4])
print(a)
[1 2 3 4]
  • 向量化版本
import time
a = np.random.rand(1000000)
b = np.random.rand(1000000)
tic = time.time()
c = np.dot(a,b)
toc = time.time()
print(a,b)
print("Vectorization version:"+str(1000*(toc-tic))+"ms")
[0.59467283 0.99173279 0.00123952 ... 0.29278564 0.97683776 0.8740815 ] 
[0.3456192  0.29648059 0.94208957 ... 0.33763542 0.62611472 0.54614243]
Vectorization version:0.9908676147460938ms
View Code
  • 向量化与循环的时间比较
import time
a = np.random.rand(1000000)
b = np.random.rand(1000000)
tic = time.time()
c = np.dot(a,b)
toc = time.time()
print(c)
print("Vectorization version:"+str(1000*(toc-tic))+"ms")

c = 0
tic = time.time()
for i in range(1000000):
    c += a[i] * b[i]
toc = time.time()
print(c)
print("For loop:"+str(1000*(toc-tic))+"ms")

249992.96116014756
Vectorization version:0.9968280792236328ms
249992.96116014617
For loop:444.3216323852539ms
View Code

1.3.2vectoring logistic regression

对于z和a向量化后,得到z = np.dot(wT,x)+b和A=[a(1),a(2),a(3)...a(m)] = sigma(z)

1.3.3vectorizing logistic regression's gradient output

broadcasting技术

broadcasting technicals可以帮我们在运算上提高效率,有一些普通的对矩阵运算的加减乘除operation

raw vector和column vector与单一数在做np.dot运算时候区别

import numpy as np
a = np.random.randn(5)
print(a)
print(a.shape)
print(a.T)
print(np.dot(a.T,a))
a = np.random.randn(5,1)
print(a)
print(a.T)
print(np.dot(a,a.T))
View Code
[-1.25150746  0.90245975 -1.5437502  -2.03129219 -0.91442366]
(5,)
[-1.25150746  0.90245975 -1.5437502  -2.03129219 -0.91442366]
9.726187792641722
[[ 0.13143985]
 [-1.06166298]
 [-0.30208537]
 [ 0.3434191 ]
 [ 0.70496854]]
[[ 0.13143985 -1.06166298 -0.30208537  0.3434191   0.70496854]]
[[ 0.01727643 -0.13954482 -0.03970606  0.04513896  0.09266096]
 [-0.13954482  1.12712829  0.32071286 -0.36459535 -0.74843901]
 [-0.03970606  0.32071286  0.09125557 -0.10374189 -0.21296069]
 [ 0.04513896 -0.36459535 -0.10374189  0.11793668  0.24209967]
 [ 0.09266096 -0.74843901 -0.21296069  0.24209967  0.49698065]]
View Code

关于logistic regression cost function的两点推导

关于broadcasting和正常的vectorization的一些解释

import numpy as np
a = np.random.randn(3,3)
b = np.random.randn(3,1)
print(a)
print(b)
c = a*b
d = np.dot(a,b)
print(c)
print(d)

[[ 0.7075494  -0.10429553 -0.17201322]
 [-0.53974707  0.51247682  0.42653902]
 [ 0.66015945  0.35415285  0.2497812 ]]
[[-0.01238072]
 [ 0.07473015]
 [ 0.64125908]]
[[-0.00875997  0.00129125  0.00212965]
 [-0.04033538  0.03829747  0.03187532]
 [ 0.42333324  0.22710373  0.16017446]]
[[-0.12685903]
 [ 0.31850195]
 [ 0.17846711]]
View Code

In numpy the "*" operator indicates element-wise multiplication. It is different from "np.dot()". If you would try "c = np.dot(a,b)" you would get c.shape = (4, 2).

Also, the broadcasting cannot happen because of the shape of b. b should have been something like (4, 1) or (1, 3) to broadcast properly.

element-wise multiplication 元素式乘法

关于nump.zeros()的输出可以参考如下

2.1shadow neural network

2.1.1compute a neural network's output

下图代表了第一个神经元的输出

把第一个神经元的输出作为第二个神经元的输入,则可以得到如下图这样单个神经元的输出

2.1.2activation function

不同activation函数

sigmoid function,tanh,ReLU(rectified linear unit), leaky ReLU

sigmoid function一般不推荐用,除非用于输出层

2.1.3non-linear activation function

因为线性隐藏层在叠加后任然是线性的,所以可以删除,而在输出的函数sigmoid function中,用非线性函数可以得到更加复杂的结果。

线性函数只会用在g(z) = z的输出层用,此时的output function可以是linear function(wb+b),但是此时需要在hidden layer中应用ReLU/tanh这种非线性函数;还有一种情况是要对数据进行压缩,会用linear function,否则很少用lin func

2.1.4derivatives of activation functions

sigmoid activation function

Tanh activation function

ReLU activation function

2.1.5gradient descent for neural networks

formula for computing derivatives including forward propagation and backward propagation

2.1.6back propagation intuition

keepdims = True could ensure that the output of B is [4,1] but not [4,]

A = np.random.randn(4,3)
B = np.sum(A, axis = 1, keepdims = True)

关于输出神经元的一题

6。Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. 
Which of the following statements is true? A.Each neuron in the first hidden layer will perform the same computation.
So even after multiple iterations of gradient descent each neuron in
the layer will be computing the same thing as other neurons. B.Each neuron in the first hidden layer will perform the same computation in the first iteration.
But after one iteration of gradient descent they will learn to compute different things because we have “broken symmetry”. C.Each neuron
in the first hidden layer will compute the same thing,
but neurons in different layers will compute different things, thus we have accomplished “symmetry breaking” as described in lecture. D.The first hidden layer’s neurons will perform different computations from each other even in the first iteration;
their parameters will thus keep evolving in their own way. 解析: 如果在初始时,两个隐藏神经元的参数设置为相同的大小,那么两个隐藏神经元对输出单元的影响也是相同的,通过反向梯度下降去进行计算的时候,会
得到同样的梯度大小,所以在经过多次迭代后,两个隐藏层单位仍然是对称的。无论设置多少个隐藏单元,其最终的影响都是相同的,那么多个隐藏神经元就没有了意义。

assert函数是一个断言函数,如果条件成立,则返回true,如果条件不成立,则返回false

if(假设成立)
{
     程序正常运行;
}
else
{
      报错&&终止程序!(避免由程序运行引起更大的错误)  
}

关于神经网络初始化为什么要随机,并且数值要小,可以参考这篇文章,here

关于np.power(a,b)

>>> x1 = range(6)
>>> x1
[0, 1, 2, 3, 4, 5]
>>> np.power(x1, 3)
array([  0,   1,   8,  27,  64, 125])

np.dot与np.multiply区别

np.dot()是对数组执行对应位置相乘,输出位标量,np.multiply()是依据矩阵乘法运算做运算,输出位矢量或秩为1的标量

3.1Deep L-layer neural network

神经网络设置多层,每一层多个神经元的目的:

  • 对于线性不可分的数据而言,多层的神经网络的设置回归效果比softmax好;
  • 对于高维数据而言,很难可视化,隐藏层的层数和每层中神经元的个数,只能通过多次调整

3.1.1为什么使用深层表示

如果层数多,那么算法复杂度可以达到o(log n),如果层数少,那么算法复杂度就要达到o(2^n)

3.1.2搭建深层神经网络块

forward and backward propagation

backward propagation for layer L

输入x,到输出loss function.

3.1.3hyperparameter

3.1.4assignment1

np.random.seed(1) is used to keep all the random function calls consistent

broadcasting的具体解释

3.1.5assignment2

architecture your model建立网络模型

CV识别一张图片的过程如下

 

The model can be summarized as: INPUT -> LINEAR -> RELU -> LINEAR -> SIGMOID -> OUTPUT

  • The input is a (64,64,3) image which is flattened to a vector of size (12288,1)(12288,1).
  • The corresponding vector: [x0,x1,...,x12287]T[x0,x1,...,x12287]T is then multiplied by the weight matrix W[1]W[1] of size (n[1],12288)(n[1],12288).
  • You then add a bias term and take its relu to get the following vector: [a[1]0,a[1]1,...,a[1]n[1]1]T[a0[1],a1[1],...,an[1]−1[1]]T.
  • You then repeat the same process.
  • You multiply the resulting vector by W[2]W[2] and add your intercept (bias).
  • Finally, you take the sigmoid of the result. If it is greater than 0.5, you classify it to be a cat.

操作后的结果

 

[LINEAR -> RELU] ×× (L-1) -> LINEAR -> SIGMOID

  • The input is a (64,64,3) image which is flattened to a vector of size (12288,1).
  • The corresponding vector: [x0,x1,...,x12287]T[x0,x1,...,x12287]T is then multiplied by the weight matrix W[1]W[1] and then you add the intercept b[1]b[1]. The result is called the linear unit.
  • Next, you take the relu of the linear unit. This process could be repeated several times for each (W[l],b[l])(W[l],b[l]) depending on the model architecture.
  • Finally, you take the sigmoid of the final linear unit. If it is greater than 0.5, you classify it to be a cat.、

 

posted on 2020-03-18 11:52  yukun093  阅读(171)  评论(0编辑  收藏  举报

导航