https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1142&context=etd_coll
-
HBP has the following properties: (i) it identifies a neural network of an appropriate depth based on the performance of the classifier at that depth.
HBP具有以下特性:(i)它根据分类器在该深度的性能识别出一个合适深度的神经网络。
-
This is a form of Online Learning with Expert Advice,[14], where the experts are DNNs of varying depth, making the DNN robust to depth.
这是一种有专家建议的在线学习形式,[14],其中的专家是不同深度的DNN,使DNN对深度稳健。
-
(ii) it offers a good initialization for deeper networks, which are encouraged to match the performance of shallower networks (if unable to beat them).
(ii)它为更深的网络提供了一个很好的初始化,鼓励更深的网络匹配较浅的网络的性能(如果无法击败它们)。
-
This facilitates knowledge transfer from shallow to deeper networks ([16, 119]), thus simulating student-teacher learning;
这有助于知识从浅层网络向深层网络转移([16,119]),从而模拟学生-教师学习;
-
(iii) it makes the learning robust to vanishing gradient and diminishing feature reuse by using a multi-depth architecture where gradients are backpropagated from shallower classifiers, and the low level features are used for the final prediction (by hedge weighted prediction);
(iii)它使学习鲁棒的消失的梯度和递减的特征重用使用一个多深度的架构,其中梯度反向传播从浅层分类器,低层特征用于最终的预测(通过对冲加权预测);
-
(iv) it can be viewed as an ensemble of multi-depth networks which are competing and collaborating to improve the final prediction performance.
(iv)它可以被视为一个多深度网络的集合,这些网络相互竞争和协作,以提高最终的预测性能。
-
The competition is induced by Hedging , whereas the collaboration is induced by sharing feature representations;
竞争是由套期保值引起的,而合作是由共享特征表示引起的;
-
(v) This allows our algorithms to continuously learn and adapt as and when it sees more data, enabling a form of life-long learning [70];
(v)这使得我们的算法在看到更多数据时能够持续学习和调整,从而形成一种终身学习的形式[70];
-
and (vi) In concept drifting scenarios [39], traditional online backpropagation would suffer from slow convergence for deep networks (when the concepts would change), whereas, HBP is able to adapt quickly due to hedging.
(vi)在概念漂移[39]场景下,对于深度网络,传统的在线反向传播收敛缓慢(当概念发生变化时),而HBP由于对冲而能够快速适应。
-
(vii) Our work is also similar to LSTMs [47] in appearance of the learning architecture, however, LSTMs aim to backpropagate through time, while our method backpropagates through depth.
(vii)我们的工作在学习架构的外观上也类似于lstm[47],然而,lstm的目标是通过时间反向传播,而我们的方法是通过深度反向传播。
-
This way HBP learns the appropriate depth capacity.
通过这种方式,HBP学会了适当的深度容量。
Performance Variation with Depth: Limitations of traditional Online BP
-
First we demonstrate the difficulty of DNN model selection in the online setting.
首先,我们演示了在线环境下DNN模型选择的难点。
-
We compare the error rate of DNNs of varying depth, in different segments of the data.
我们比较了不同深度的dnn在不同数据段中的错误率。
-
All models were trained online, and we evaluate their performance in different windows (or stages) of the learning process.
所有模型都是在线训练的,我们评估它们在学习过程的不同窗口(或阶段)中的表现。
-
See Table 6.2.
见表6.2。
-
In the first 0.5% of the data, the shallowest network obtains the best performance indicating faster convergence - which would indicate that we should use the shallow network for the task.
在前0.5%的数据中,最浅的网络获得了最好的性能,表明收敛速度更快——这表明我们应该使用浅网络来完成任务。
-
In the segment from [10-15]%, a 4-layer DNN seems to have the best performance in most cases.
在[10-15]%的段中,4层DNN在大多数情况下性能最好。
-
And in the segment from [60-80]% of the data, an 8-layer network gives a better performance.
在[60-80]%的数据段中,8层网络的性能更好。
-
This indicates that deeper networks took a longer time to converge, but at a later stage gave a better performance.
这说明深度越深的网络收敛时间越长,但在后期表现越好。
-
Looking at the final error, it does not give us conclusive evidence of what depth of network would be the most suitable.
看看最后的错误,它没有给我们提供什么深度的网络将是最合适的结论性证据。
-
Furthermore, if the datastream had more instances, a deeper network may have given an overall better performance.
此外,如果数据流有更多的实例,更深的网络可能会提供更好的整体性能。
Problem
-
Unfortunately, using such a model for an online learning (i.e. Online Backpropagation) task faces several issues with convergence.
不幸的是,在在线学习(即在线反向传播)任务中使用这样的模型会面临一些收敛问题。
-
Most notably: (i) For such models, a fixed depth of the neural network has to be decided a priori, and this cannot be changed during the training process.
最显著的是: (i) 对于这类模型,神经网络的固定深度需要先验确定,这在训练过程中是不能改变的。
-
This is problematic, as determining the depth is a difficult task.
这是有问题的,因为确定深度是一项困难的任务。
-
Moreover, in an online setting, different depths may be suitable for a different number of instances to be processed, e.g. because of convergence reasons, shallow networks maybe preferred for small number of instances, and deeper networks for large number of instances.
此外,在在线环境中,不同的深度可能适合处理不同数量的实例,例如,由于收敛的原因,浅层网络可能适合少量实例,而深度网络适合大量实例。
-
Our aim is to exploit the fast convergence of shallow networks at the initial stages, and the power of deep representation at a later stage;
我们的目标是在初始阶段利用浅层网络的快速收敛,在后期利用深度表示的力量;
-
(ii) vanishing gradient is well noted problem that slows down learning.
(ii)消失梯度是众所周知的减慢学习速度的问题。
-
This is even more critical in an online setting, where the model needs to make predictions and learn simultaneously;
这在在线环境中更为关键,因为模型需要同时进行预测和学习;
-
(iii) diminishing feature reuse, according to which many useful features are lost in the feedforward stage of the prediction.
(iii)特征重用减少,许多有用的特征在预测的前馈阶段丢失。
-
This again is very critical for online learning, where it is imperative to quickly find the important features, so as to not suffer from poor performance for the initial training instances.
这对于在线学习来说也是非常关键的,快速找到重要的特征是非常必要的,这样才不会在初始的训练实例中表现不佳。