NIPS 2015 Deep Learning Tutorial Notes

Notes on today's opening tutorial on deep learning with Geoff Hinton, Yoshua Bengio, and Yann LeCun.

Geoff was unable to join today.
Breakthrough - deep learning is about learning different layers of representation / abstraction.
Four key ingredients for ML towards AI
- lots of data
- flexible models
- enough computing power
- powerful priors
Connectionism: concepts are represented by patterns of activation rather than symbols in the brain.
(Stacked?) neural nets are exponentially more statistically powerful than clustering / nearest-neighbor models.
Features can be discovered with a linear number of examples with neural nets (rather than an exponential number)
Difference between deep learning and prior work is the idea of composing features upon features.
'Deep learning is not a completely general solution to everything.'
'There is no magic.'
Backprop: training method for practical deep learning.
ReLUs 'not exactly differentiable but close enough to work.'
Convolutional Feature Map: scan an array of coefficients over the image; output given by dot product with region.
Pooling: computes an aggregation of the outputs of the neighbors of a previous layer.
Recurrent neurel nets: selectively summarize an input sequence in a fixed-size state vector via a recursive update.
RNNs are rolled out / unfolded so that backprop can be applied through time.
An RNN can represent a fully-connected directed generative model: every variable predicted from all previous ones.
RNNs can be thought of as a directed probabilistic graphical models.
RNNs struggle with long-term dependencies. Tricks to help:
- Gradient Clipping
- Leaky Integration
- Momentum
LSTMs popular architecture for dealing with long-term dependencies.
- Create paths in prop / backprop that allow information to be copied.
Normalize inputs 'to avoid ill conditioning' when using backprop.
Multilayer nueral net objective is nonconvex.
Local minima aren't a problem, though(?)
'Almost all of the local minima are equivalent.'
- Most local minima are close to global minimum error.
- Error of trained nets tends to be sharply concentrated.
- From results from Yoshua and Lecun's groups in the last few years.
ReLU one of the key tricks in ImageNet result.
Dropout can be thought of as regularization.
- 'Brutal, murderous, genocidal regularization.'
Early stopping is 'beautiful free lunch.'
Should use random search when searching hyperparameters.
- There is some neat stuff using GPs to approximate performance surface.
Training can be distributed using asynchronous SGD, but bottlenecks on sharing weights / updates.
New parallel method: EASGD.
700 million photos uploaded to Facebook every day. Each goes through two conv nets; one for object recognition and one for face recognition.
Tesla autopilot uses convolutional net.
Graphical models and conv nets can be jointly trained.
Attention mechanism - add a layer that can 'learn where to look' in a sequence.
Recent models add 'memory' to recurrent neural nets:
- LSTM
- Memory Networks
- Neural Turing Machine
Relational learning was plugged!

菜鸡一枚

NIPS 2015 Deep Learning Tutorial Notes

公告