博客园 :: 首页 :: 新随笔 :: 联系 :: 订阅 订阅 :: 管理
  50 Posts :: 0 Stories :: 2 Comments :: 0 Trackbacks


Link of the Paper: https://arxiv.org/abs/1411.4389

Main Points:

  1. A novel Recurrent Convolutional Architecture ( CNN + LSTM ): both Spatially and Temporally Deep.
  2. The recurrent long-term models are directly connected to modern visual convnet models and can be jointly trained to simultaneously learn temporal dynamics and convolutional perceptual representations.

Other Key Points:

  1. A significant limitation of simple RNN models which strictly integrate state information over time is known as the "vanishing gradient" effect: the ability to backpropogate an error signal through a long-range temporal interval becomes increasingly impossible in practice.
  2. The authors show LSTM-type models provide for improved recognition on conventional video activity challenges and enable a novel end-to-end optimizable mapping from image pixels to sentence-level natural language descriptions.
posted on 2018-08-13 11:31 LZ_Jaja 阅读(...) 评论(...) 编辑 收藏