视频研究入门经典

Labor-Free Video Concept Learningby Jointly Exploiting Web Videos and Images

intro: CVPR 2016
intro: Lead–Exceed Neural Network (LENN), LSTM
paper: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/CVPR16_webly_final.pdf

Video Fill in the Blank with Merging LSTMs

intro: for Large Scale Movie Description and Understanding Challenge (LSMDC) 2016, "Movie fill-in-the-blank" Challenge, UCF_CRCV
intro: Video-Fill-in-the-Blank (ViFitB)
arxiv: https://arxiv.org/abs/1610.04062

Video Pixel Networks

intro: Google DeepMind
arxiv: https://arxiv.org/abs/1610.00527

Robust Video Synchronization using Unsupervised Deep Learning

arxiv: https://arxiv.org/abs/1610.05985

Video Propagation Networks

intro: CVPR 2017. Max Planck Institute for Intelligent Systems & Bernstein Center for Computational Neuroscience
project page: https://varunjampani.github.io/vpn/
arxiv: https://arxiv.org/abs/1612.05478
github(Caffe): https://github.com/varunjampani/video_prop_networks

Video Frame Synthesis using Deep Voxel Flow

project page: https://liuziwei7.github.io/projects/VoxelFlow.html
arxiv: https://arxiv.org/abs/1702.02463

Optimizing Deep CNN-Based Queries over Video Streams at Scale

intro: Stanford InfoLab
keywords: NoScope. difference detectors, specialized models
arxiv: https://arxiv.org/abs/1703.02529
github: https://github.com/stanford-futuredata/noscope
github: https://github.com/stanford-futuredata/tensorflow-noscope

NoScope: 1000x Faster Deep Learning Queries over Video

http://dawn.cs.stanford.edu/2017/06/22/noscope/

Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos

intro: CVPR 2017. Stanford University & University of Southern California
arxiv: https://arxiv.org/abs/1703.02521

ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos

https://arxiv.org/abs/1703.09788

Unsupervised Learning Layers for Video Analysis

intro: Baidu Research
intro: "The experiments demonstrated the potential applications of UL layers and online learning algorithm to head orientation estimation and moving object localization"
arxiv: https://arxiv.org/abs/1705.08918

Look, Listen and Learn

intro: DeepMind
intro: "Audio-Visual Correspondence" learning
arxiv: https://arxiv.org/abs/1705.08168

Video Imagination from a Single Image with Transformation Generation

intro: Peking University
arxiv: https://arxiv.org/abs/1706.04124
github: https://github.com/gitpub327/VideoImagination

Learning to Learn from Noisy Web Videos

intro: CVPR 2017. Stanford University & CMU & Simon Fraser University
arxiv: https://arxiv.org/abs/1706.02884

Convolutional Long Short-Term Memory Networks for Recognizing First Person Interactions

intro: Accepted on the second International Workshop on Egocentric Perception, Interaction and Computing(EPIC) at International Conference on Computer Vision(ICCV-17)
arxiv: https://arxiv.org/abs/1709.06495

Learning Binary Residual Representations for Domain-specific Video Streaming

intro: AAAI 2018
project page: http://research.nvidia.com/publication/2018-02_Learning-Binary-Residual
arxiv: https://arxiv.org/abs/1712.05087

Video Representation Learning Using Discriminative Pooling

intro: CVPR 2018
arxiv: https://arxiv.org/abs/1803.10628

Rethinking the Faster R-CNN Architecture for Temporal Action Localization

intro: CVPR 2018
arxiv: https://arxiv.org/abs/1804.07667

Deep Keyframe Detection in Human Action Videos

intro: two-stream ConvNet
arxiv: https://arxiv.org/abs/1804.10021

FFNet: Video Fast-Forwarding via Reinforcement Learning

intro: CVPR 2018
arxiv: https://arxiv.org/abs/1805.02792

Fast forwarding Egocentric Videos by Listening and Watching

https://arxiv.org/abs/1806.04620

Scanner: Efficient Video Analysis at Scale

intro: CMU
arxiv: https://arxiv.org/abs/1805.07339

Massively Parallel Video Networks

intro: DeepMind & University of Oxford
arxiv: https://arxiv.org/abs/1806.03863

Object Level Visual Reasoning in Videos

intro: LIRIS & Facebook AI Research
arxiv: https://arxiv.org/abs/1806.06157

Video Time: Properties, Encoders and Evaluation

intro: BMVC 2018
arxiv: https://arxiv.org/abs/1807.06980

视频分类

Large-scale Video Classification with Convolutional Neural Networks

intro: CVPR 2014
project page: http://cs.stanford.edu/people/karpathy/deepvideo/
paper: www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Karpathy_Large-scale_Video_Classification_2014_CVPR_paper.pdf

Exploiting Image-trained CNN Architectures for Unconstrained Video Classification

intro: Video-level event detection. extracting deep features for each frame, averaging frame-level deep features
arxiv: http://arxiv.org/abs/1503.04144

Beyond Short Snippets: Deep Networks for Video Classification

intro: CNN + LSTM
arxiv: http://arxiv.org/abs/1503.08909
demo: http://pan.baidu.com/s/1eQ9zLZk

Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification

intro: ACM Multimedia, 2015
arxiv: http://arxiv.org/abs/1504.01561

Video Content Recognition with Deep Learning

author: Zuxuan Wu, Fudan University
slides: http://vision.ouc.edu.cn/valse/slides/20160420/Zuxuan Wu - Video Content Recognition with Deep Learning-Zuxuan Wu.pdf

Video Content Recognition with Deep Learning

author: Yu-Gang Jiang, Lab for Big Video Data Analytics (BigVid), Fudan University
slides: http://www.yugangjiang.info/slides/DeepVideoTalk-2015.pdf

Efficient Large Scale Video Classification

intro: Google
arxiv: http://arxiv.org/abs/1505.06250

Fusing Multi-Stream Deep Networks for Video Classification

arxiv: http://arxiv.org/abs/1509.06086

Learning End-to-end Video Classification with Rank-Pooling

paper: http://jmlr.org/proceedings/papers/v48/fernando16.html
paper: http://jmlr.csail.mit.edu/proceedings/papers/v48/fernando16.pdf
summary(by Hugo Larochelle): http://www.shortscience.org/paper?bibtexKey=conf/icml/FernandoG16#hlarochelle

Deep Learning for Video Classification and Captioning

arxiv: http://arxiv.org/abs/1609.06782

Fast Video Classification via Adaptive Cascading of Deep Models

arxiv: https://arxiv.org/abs/1611.06453

Deep Feature Flow for Video Recognition

intro: CVPR 2017
intro: It provides a simple, fast, accurate, and end-to-end framework for video recognition (e.g., object detection and semantic segmentation in videos)
arxiv: https://arxiv.org/abs/1611.07715
github(official, MXNet): https://github.com/msracver/Deep-Feature-Flow
youtube: https://www.youtube.com/watch?v=J0rMHE6ehGw

Large-Scale YouTube-8M Video Understanding with Deep Neural Networks

https://arxiv.org/abs/1706.04488

Deep Learning Methods for Efficient Large Scale Video Labeling

intro: Solution to the Kaggle's competition Google Cloud & YouTube-8M Video Understanding Challenge
arxiv: https://arxiv.org/abs/1706.04572
github: https://github.com/mpekalski/Y8M

Learnable pooling with Context Gating for video classification

intro: CVPR17 Youtube 8M workshop. Kaggle 1st place
arxiv: https://arxiv.org/abs/1706.06905
github: https://github.com/antoine77340/LOUPE

Aggregating Frame-level Features for Large-Scale Video Classification

intro: Youtube-8M Challenge, 4th place
arxiv: https://arxiv.org/abs/1707.00803

Tensor-Train Recurrent Neural Networks for Video Classification

https://arxiv.org/abs/1707.01786

Hierarchical Deep Recurrent Architecture for Video Understanding

intro: Classification Challenge Track paper in CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding
arxiv: https://arxiv.org/abs/1707.03296

Large-scale Video Classification guided by Batch Normalized LSTM Translator

intro: CVPR2017 Workshop on Youtube-8M Large-scale Video Understanding
arxiv: https://arxiv.org/abs/1707.04045

UTS submission to Google YouTube-8M Challenge 2017

intro: CVPR'17 Workshop on YouTube-8M
arxiv: https://arxiv.org/abs/1707.04143
github: https://github.com/ffmpbgrnn/yt8m

A spatiotemporal model with visual attention for video classification

https://arxiv.org/abs/1707.02069

Cultivating DNN Diversity for Large Scale Video Labelling

intro: CVPR 2017 Youtube-8M Workshop
arxiv: https://arxiv.org/abs/1707.04272

Attention Transfer from Web Images for Video Recognition

intro: ACM Multimedia, 2017
arxiv: https://arxiv.org/abs/1708.00973

Non-local Neural Networks

intro: CVPR 2018. CMU & Facebook AI Research
arxiv: https://arxiv.org/abs/1711.07971
github(Caffe2): https://github.com/facebookresearch/video-nonlocal-net

Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification

https://arxiv.org/abs/1711.08200

Appearance-and-Relation Networks for Video Classification

arxiv: https://arxiv.org/abs/1711.09125
github: https://github.com/wanglimin/ARTNet

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

intro: ECCV 2018. Google Research & University of California San Diego
arxiv: https://arxiv.org/abs/1712.04851

Long Activity Video Understanding using Functional Object-Oriented Network

https://arxiv.org/abs/1807.00983

Deep Architectures and Ensembles for Semantic Video Classification

https://arxiv.org/abs/1807.01026

Deep Discriminative Model for Video Classification

intro: ECCV 2018
arxiv: https://arxiv.org/abs/1807.08259

Deep Video Color Propagation

intro: BMVC 2018
arxuv: https://arxiv.org/abs/1808.03232

Non-local NetVLAD Encoding for Video Classification

intro: ECCV 2018 workshop on YouTube-8M Large-Scale Video Understanding
intro: Tencent AI Lab & Fudan University
arxiv: https://arxiv.org/abs/1810.00207

Learnable Pooling Methods for Video Classification

intro: Youtube 8M ECCV18 Workshop
arxiv: https://arxiv.org/abs/1810.00530
github: https://github.com/pomonam/LearnablePoolingMethods

NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification

intro: ECCV 2018 workshop
arxiv: https://arxiv.org/abs/1811.05014
github: https://github.com/linrongc/youtube-8m

视频行为识别 / 行为检测

3d convolutional neural networks for human action recognition

paper: http://www.cs.odu.edu/~sji/papers/pdf/Ji_ICML10.pdf

Sequential Deep Learning for Human Action Recognition

paper: http://liris.cnrs.fr/Documents/Liris-5228.pdf

Two-stream convolutional networks for action recognition in videos

arxiv: http://arxiv.org/abs/1406.2199

Finding action tubes

intro: "built action models from shape and motion cues. They start from the image proposals and select the motion salient subset of them and extract saptio-temporal features to represent the video using the CNNs."
arxiv: http://arxiv.org/abs/1411.6031

Hierarchical Recurrent Neural Network for Skeleton Based Action Recognition

paper: http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Du_Hierarchical_Recurrent_Neural_2015_CVPR_paper.pdf

Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors

intro: CVPR 2015. TDD
paper: www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Wang_Action_Recognition_With_2015_CVPR_paper.pdf
ext: http://www.cv-foundation.org/openaccess/content_cvpr_2015/app/2B_105_ext.pdf
poster: https://wanglimin.github.io/papers/WangQT_CVPR15_Poster.pdf
github: https://github.com/wanglimin/TDD

Action Recognition by Hierarchical Mid-level Action Elements

paper: http://cvgl.stanford.edu/papers/tian2015.pdf

Contextual Action Recognition with R*CNN

arxiv: http://arxiv.org/abs/1505.01197
github: https://github.com/gkioxari/RstarCNN

Towards Good Practices for Very Deep Two-Stream ConvNets

arxiv: http://arxiv.org/abs/1507.02159
github: https://github.com/yjxiong/caffe

Action Recognition using Visual Attention

intro: LSTM / RNN
arxiv: http://arxiv.org/abs/1511.04119
project page: http://shikharsharma.com/projects/action-recognition-attention/
github(Python/Theano): https://github.com/kracwarlock/action-recognition-visual-attention

End-to-end Learning of Action Detection from Frame Glimpses in Videos

intro: CVPR 2016
project page: http://ai.stanford.edu/~syyeung/frameglimpses.html
arxiv: http://arxiv.org/abs/1511.06984
paper: http://vision.stanford.edu/pdf/yeung2016cvpr.pdf

Multi-velocity neural networks for gesture recognition in videos

arxiv: http://arxiv.org/abs/1603.06829

Active Learning for Online Recognition of Human Activities from Streaming Videos

arxiv: http://arxiv.org/abs/1604.02855

Convolutional Two-Stream Network Fusion for Video Action Recognition

arxiv: http://arxiv.org/abs/1604.06573
github: https://github.com/feichtenhofer/twostreamfusion

Deep, Convolutional, and Recurrent Models for Human Activity Recognition using Wearables

arxiv: http://arxiv.org/abs/1604.08880

Unsupervised Semantic Action Discovery from Video Collections

arxiv: http://arxiv.org/abs/1605.03324

Anticipating Visual Representations from Unlabeled Video

paper: http://web.mit.edu/vondrick/prediction.pdf

VideoLSTM Convolves, Attends and Flows for Action Recognition

arxiv: http://arxiv.org/abs/1607.01794

Hierarchical Attention Network for Action Recognition in Videos (HAN)

arxiv: http://arxiv.org/abs/1607.06416

Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition

arxiv: http://arxiv.org/abs/1607.07043

Connectionist Temporal Modeling for Weakly Supervised Action Labeling

arxiv: http://arxiv.org/abs/1607.08584

CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016

intro: won the 1st place in the untrimmed video classification task of ActivityNet Challenge 2016. TSN
arxiv: http://arxiv.org/abs/1608.00797
github: https://github.com/yjxiong/anet2016-cuhk

Actionness Estimation Using Hybrid FCNs

intro: CVPR 2016. H-FCN
project page: http://wanglimin.github.io/actionness_hfcn/index.html
paper: http://wanglimin.github.io/papers/WangQTV_CVPR16.pdf
github: https://github.com/wanglimin/actionness-estimation/

Real-time Action Recognition with Enhanced Motion Vector CNNs

intro: CVPR 2016
project page: http://zbwglory.github.io/MV-CNN/index.html
paper: http://wanglimin.github.io/papers/ZhangWWQW_CVPR16.pdf
github: https://github.com/zbwglory/MV-release

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

intro: ECCV 2016. HMDB51: 69.4%, UCF101: 94.2%
arxiv: http://arxiv.org/abs/1608.00859
paper: http://wanglimin.github.io/papers/WangXWQLTV_ECCV16.pdf
github: https://github.com/yjxiong/temporal-segment-networks

Temporal Segment Networks for Action Recognition in Videos

intro: An extension of submission http://arxiv.org/abs/1608.00859
arxiv: https://arxiv.org/abs/1705.02953

Hierarchical Attention Network for Action Recognition in Videos

arxiv: http://arxiv.org/abs/1607.06416

DeepCAMP: Deep Convolutional Action & Attribute Mid-Level Patterns

intro: CVPR 2016
arxiv: http://arxiv.org/abs/1608.03217

Depth2Action: Exploring Embedded Depth for Large-Scale Action Recognition

arxiv: http://arxiv.org/abs/1608.04339

Dynamic Image Networks for Action Recognition

intro: CVPR 2016
arxiv: http://users.cecs.anu.edu.au/~sgould/papers/cvpr16-dynamic_images.pdf
github: https://github.com/hbilen/dynamic-image-nets

Human Action Recognition without Human

arxiv: http://arxiv.org/abs/1608.07876

Temporal Convolutional Networks: A Unified Approach to Action Segmentation

arxiv: http://arxiv.org/abs/1608.08242
ECCV 2016 workshop: http://bravenewmotion.github.io/

Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

intro: Bachelor Thesis Report at ETSETB TelecomBCN
project page: https://imatge-upc.github.io/activitynet-2016-cvprw/
arxiv: http://arxiv.org/abs/1608.08128
github: https://github.com/imatge-upc/activitynet-2016-cvprw

Sequential Deep Trajectory Descriptor for Action Recognition with Three-stream CNN

arxiv: http://arxiv.org/abs/1609.03056

Semi-Coupled Two-Stream Fusion ConvNets for Action Recognition at Extremely Low Resolutions

arxiv: https://arxiv.org/abs/1610.03898

Spatiotemporal Residual Networks for Video Action Recognition

intro: NIPS 2016
arxiv: https://arxiv.org/abs/1611.02155

Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks

arxiv: https://arxiv.org/abs/1611.02447

Deep Recurrent Neural Network for Mobile Human Activity Recognition with High Throughput

arxiv: https://arxiv.org/abs/1611.03607

Joint Network based Attention for Action Recognition

arxiv: https://arxiv.org/abs/1611.05215

Temporal Convolutional Networks for Action Segmentation and Detection

arxiv: https://arxiv.org/abs/1611.05267

AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos

arxiv: https://arxiv.org/abs/1611.08240

ActionFlowNet: Learning Motion Representation for Action Recognition

arxiv: https://arxiv.org/abs/1612.03052

Higher-order Pooling of CNN Features via Kernel Linearization for Action Recognition

intro: Australian Center for Robotic Vision & Data61/CSIRO
arxiv: https://arxiv.org/abs/1701.05432

Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos

https://arxiv.org/abs/1703.10664

Temporal Action Detection with Structured Segment Networks

project page: http://yjxiong.me/others/ssn/
arxiv: https://arxiv.org/abs/1704.06228
github: https://github.com/yjxiong/action-detection

Recurrent Residual Learning for Action Recognition

https://arxiv.org/abs/1706.08807

Hierarchical Multi-scale Attention Networks for Action Recognition

https://arxiv.org/abs/1708.07590

Two-stream Flow-guided Convolutional Attention Networks for Action Recognition

intro: International Conference of Computer Vision Workshop (ICCVW), 2017
arxiv: https://arxiv.org/abs/1708.09268

Action Classification and Highlighting in Videos

https://arxiv.org/abs/1708.09522

Real-Time Action Detection in Video Surveillance using Sub-Action Descriptor with Multi-CNN

https://arxiv.org/abs/1710.03383

End-to-end Video-level Representation Learning for Action Recognition

keywords: Deep networks with Temporal Pyramid Pooling (DTPP)
arxiv: https://arxiv.org/abs/1711.04161

Fully-Coupled Two-Stream Spatiotemporal Networks for Extremely Low Resolution Action Recognition

intro: WACV 2018
arxiv: https://arxiv.org/abs/1801.03983

DiscrimNet: Semi-Supervised Action Recognition from Videos using Generative Adversarial Networks

https://arxiv.org/abs/1801.07230

A Fusion of Appearance based CNNs and Temporal evolution of Skeleton with LSTM for Daily Living Action Recognition

https://arxiv.org/abs/1802.00421

Real-Time End-to-End Action Detection with Two-Stream Networks

https://arxiv.org/abs/1802.08362

A Closer Look at Spatiotemporal Convolutions for Action Recognition

intro: CVPR 2018. Facebook Research
intro: R(2+1)D and Mixed-Convolutions for Action Recognition.
project page: https://dutran.github.io/R2Plus1D/
arxiv: https://arxiv.org/abs/1711.11248
github: https://github.com/facebookresearch/R2Plus1D

VideoCapsuleNet: A Simplified Network for Action Detection

https://arxiv.org/abs/1805.08162

Where and When to Look? Spatio-temporal Attention for Action Recognition in Videos

https://arxiv.org/abs/1810.04511
Projects

A Torch Library for Action Recognition and Detection Using CNNs and LSTMs

intro: CS231n student project report
paper: http://cs231n.stanford.edu/reports2016/221_Report.pdf
github: https://github.com/garythung/torch-lrcn

2016 ActivityNet action recognition challenge. CNN + LSTM approach. Multi-threaded loading.

github: https://github.com/jrbtaylor/ActivityNet

LSTM for Human Activity Recognition

github: https://github.com/guillaume-chevalier/LSTM-Human-Activity-Recognition/
github(MXNet): https://github.com/Ldpe2G/DeepLearningForFun/tree/master/Mxnet-Scala/HumanActivityRecognition

Scanner: Efficient Video Analysis at Scale

intro: Locate and recognize faces in a video, Detect shots in a film, Search videos by image
github: https://github.com/scanner-research/scanner

Charades Starter Code for Activity Classification and Localization

intro: Activity Recognition Algorithms for the Charades Dataset
github: https://github.com/gsig/charades-algorithms

NonLocalNetwork and Sequeeze-Excitation Network

intro: MXNet implementation of Non-Local and Squeeze-Excitation network
github: https://github.com/WillSuen/NonLocalandSEnet

事件识别

TagBook: A Semantic Video Representation without Supervision for Event Detection

arxiv: http://arxiv.org/abs/1510.02899

AENet: Learning Deep Audio Features for Video Analysis

arxiv: https://arxiv.org/abs/1701.00599
github: https://github.com/znaoya/aenet

事件检测

DevNet: A Deep Event Network for Multimedia Event Detection and Evidence Recounting

paper: http://120.52.72.47/winsty.net/c3pr90ntcsf0/papers/devnet.pdf
paper: http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Gan_DevNet_A_Deep_2015_CVPR_paper.pdf

Detecting events and key actors in multi-person videos

intro: CVPR 2016
arxiv: http://arxiv.org/abs/1511.02917
paper: www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Ramanathan_Detecting_Events_and_CVPR_2016_paper.pdf
paper: http://vision.stanford.edu/pdf/johnson2016cvpr.pdf
blog: http://www.leiphone.com/news/201606/l1TKIRFLO3DUFNNu.html

Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Detection

intro: INTERSPEECH 2016
arxiv: https://arxiv.org/abs/1604.07160

Efficient Action Detection in Untrimmed Videos via Multi-Task Learning

arxiv: https://arxiv.org/abs/1612.07403

Joint Event Detection and Description in Continuous Video Streams

intro: Joint Event Detection and Description Network (JEDDi-Net)
arxiv: https://arxiv.org/abs/1802.10250

转自: https://blog.csdn.net/WJ_MeiMei/article/details/84344836

posted @ 2019-11-25 17:03 Geoffreygau 阅读(999) 评论(0) 收藏举报

刷新页面返回顶部

Geoffrey

Long, long the pathway to Cold Hill; Drear, drear the waterside so chill.

Geoffrey

视频研究入门经典

视频研究入门经典

Labor-Free Video Concept Learningby Jointly Exploiting Web Videos and Images

Video Fill in the Blank with Merging LSTMs

Video Pixel Networks

Robust Video Synchronization using Unsupervised Deep Learning

Video Propagation Networks

Video Frame Synthesis using Deep Voxel Flow

Optimizing Deep CNN-Based Queries over Video Streams at Scale

NoScope: 1000x Faster Deep Learning Queries over Video

Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos

ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos

Unsupervised Learning Layers for Video Analysis

Look, Listen and Learn

Video Imagination from a Single Image with Transformation Generation

Learning to Learn from Noisy Web Videos

Convolutional Long Short-Term Memory Networks for Recognizing First Person Interactions

Learning Binary Residual Representations for Domain-specific Video Streaming

Video Representation Learning Using Discriminative Pooling

Rethinking the Faster R-CNN Architecture for Temporal Action Localization

Deep Keyframe Detection in Human Action Videos

FFNet: Video Fast-Forwarding via Reinforcement Learning

Fast forwarding Egocentric Videos by Listening and Watching

Scanner: Efficient Video Analysis at Scale

Massively Parallel Video Networks

Object Level Visual Reasoning in Videos

Video Time: Properties, Encoders and Evaluation

视频分类

Large-scale Video Classification with Convolutional Neural Networks

Exploiting Image-trained CNN Architectures for Unconstrained Video Classification

Beyond Short Snippets: Deep Networks for Video Classification

Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification

Video Content Recognition with Deep Learning

Video Content Recognition with Deep Learning

Efficient Large Scale Video Classification

Fusing Multi-Stream Deep Networks for Video Classification

Learning End-to-end Video Classification with Rank-Pooling

Deep Learning for Video Classification and Captioning

Fast Video Classification via Adaptive Cascading of Deep Models

Deep Feature Flow for Video Recognition

Large-Scale YouTube-8M Video Understanding with Deep Neural Networks

Deep Learning Methods for Efficient Large Scale Video Labeling

Learnable pooling with Context Gating for video classification

Aggregating Frame-level Features for Large-Scale Video Classification

Tensor-Train Recurrent Neural Networks for Video Classification

Hierarchical Deep Recurrent Architecture for Video Understanding

Large-scale Video Classification guided by Batch Normalized LSTM Translator

UTS submission to Google YouTube-8M Challenge 2017

A spatiotemporal model with visual attention for video classification

Cultivating DNN Diversity for Large Scale Video Labelling

Attention Transfer from Web Images for Video Recognition

Non-local Neural Networks

Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification

Appearance-and-Relation Networks for Video Classification

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

Long Activity Video Understanding using Functional Object-Oriented Network

Deep Discriminative Model for Video Classification

Deep Video Color Propagation

Non-local NetVLAD Encoding for Video Classification

Learnable Pooling Methods for Video Classification

NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification

视频行为识别 / 行为检测

3d convolutional neural networks for human action recognition

Sequential Deep Learning for Human Action Recognition

Two-stream convolutional networks for action recognition in videos

Finding action tubes

Hierarchical Recurrent Neural Network for Skeleton Based Action Recognition

Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors

Action Recognition by Hierarchical Mid-level Action Elements

Towards Good Practices for Very Deep Two-Stream ConvNets

Action Recognition using Visual Attention

End-to-end Learning of Action Detection from Frame Glimpses in Videos

Multi-velocity neural networks for gesture recognition in videos

Convolutional Two-Stream Network Fusion for Video Action Recognition

Deep, Convolutional, and Recurrent Models for Human Activity Recognition using Wearables

Unsupervised Semantic Action Discovery from Video Collections

Anticipating Visual Representations from Unlabeled Video

Long, long the pathway to Cold Hill;
Drear, drear the waterside so chill.