神经网络模型算法与生物神经网络的最新联系

来源

★We initializaed the weights in a stupid way.

Bengio组的Xavier在2010年推出了一个更合适的范围，能够使得隐层Sigmoid系函数获得最好的激活范围。

★We used the wrong type of non-linearty.

RBF很快就被遗弃了。Sigmoid春风吹满地。直到2010年后，才普遍推广ReLU.

Hinton顺便解释了ReLU有效的原因，这点在其他paper倒是很少见到：

☻They automatically equalize the magnitude of gradient in different layers

-Layers with big weights get small gradients

-Layers with small weights get big gradients

生物神经网络 VS 神经网络模型

挑战书一：生物神经元不可能进行数值计算

★Cortical neurons do not communicate real-valued activities. (2012年)

-Current Solved: They send spikes.

挑战书二：生物神经元难道会自动求导?

★How do neurons know $\frac{dy}{dx}$(Gradient)

挑战书三：生物神经元是否真的需要发两种信号(前向传播+反向传播)？

★The neurons need to send two different types of signal

-Forward Pass (Signal=activity=y)

-Backward Pass (Signal=$\frac{dE}{dx}$)

挑战书四：生物神经网络元没有同值的互连接

★Neurons do not have point-wise reciprocal connections with the same weight in both directions.

I、输入--->输出

II、输出--->输入--->输出

Hinton的解释

解释一：符合泊松过程的电信号机制

★Synapses are much cheaper than training cases.

-We have 10^14 synapses and are for 10^9 seconds(3年)

★Sending random spikes from Possion process is very similar to dropout

-It is better than sending real-values.

★A good way to throw a lot of parameters at a task is to use big neural networkswith dropout

解释二：Error误差导数的生物神经编码体系

Hinton给出了一个神奇的生物神经编码体系，

★Don’t compute it. Measure it.

Make a small change to the total input and observe the resulting small change in the total output.

★编码体系：

$\frac{dE}{dx}=Target-y$

$\Delta Neuron(t)=y+t(Target-y) \quad where t=1,2,3.....Time$

Hinton把这个非求导的$\Delta$更新量称之为“时间导数"，意思就是：

★Spike-time-dependent plasticity is just a derivative filter.

You need a computation theory (not a billion euros) to recognize what you discovered!

LeCun：从根本上说，欧盟人脑计划（Human Brain Project）中的很大部分也是基于这样一种理念：

解释三：双向连接回炉机制

$While(Not Converged)$

$Forward-Pass$

$Backward-Pass$

★If the AutoEncoder is perfect，replacing the bottom-up input by the top-down input，

will have no effect on the input of i.

If we then start moving $y_{j}$ and $y_{k}$ towards their target values，we  get：

$\tilde{x_{i}}=w_{ji} \cdot y_{j}+w_{ki} \cdot y_{k}$

$x=\tilde{x}$

$\Delta Neuron = (w_{ji} \cdot y_{j}+w_{ki} \cdot y_{k})\frac{dy_{i}}{dx_{i}}=\frac{dE}{dx_{i}}$

解释四：神经突触更新法则

★First do an upward(forward) pass as usual.

★Then do top-down reconstruction at each level.

★Then perturb the top-level activities by  blending them with the target values

so that the rate of change of activity of a top-level unit represents the derivative

of the error with respect to the total input to that unit.

-This will make the activity changes at every level represent error derivatives.

★Then update each synapse in proportion to：

$pre-synapse\,\, activty + \Delta Neruon=post-synapse \,\,activity$

解释五：自适应Pre-Training

★This way of performing backpropagation appears to requires symmetric weights.

★What happens if the top-down weights are random and fixed？

★Lillicrap,Cownden,Tweed&Akerman(2014)

showed that backprop still works almost as well.

-The bottom-up weights adapt so that the fixed top-down weights are approximately their

pseudo-inverse near the data mainfold.

———————————————————————————————————————————————————

★If it works for fixed top-down weights，it must work for slowly changing top-down weights.

-So adapt the top-down weights to be good at reconstructing the activity in the layer below.

-This is just the wake-phase of the wake-sleep algorithm.

★With slowly apdapting top-down weights it works better.

-A 784-800-800-10 network with 50% dropout gets 153 errors.

-With fixed top-down pre-training，it gets 160 errors.

-With real adaptive backprop，it gets 150 errors and learns faster.

总结：脑皮层中的反向传播

放电信号不是大问题

★The fact that neurons send spikes rather than real numbers is not a problem

-Spikes are a great regularizer.

时间导数

★Error derivatives can be represented as temporal derivatives.

-This allows a neuron to represent both its activity and its error derivative in the same axon.

$(temporal\_\_derivatives\quad \propto activity=y)$

放电时刻即是反向传播的时刻

★Spike-time dependent plasticity is the signature of backpropagation learning.

双向权值不是大问题

★The problem that each bottom-up connection needs to have a corresponding

top-down connection is a non-problem.

-Random top-down weights work just fine.

posted @ 2015-08-24 23:25  Physcal  阅读(3554)  评论(0编辑  收藏  举报