论文学习3——Non-local Neural Networks

Introduction

To get long-range dependencies in neural network, the researchers always use the convolutional and recurrent operations repeatedly. However, this way cause the limitations as follow.

  1. Inefficient computation

  2. Difficult optimization

Related Work

Definition: Non-local means is a classical filtering algorithms that computes a weighted mean of all pixels in an image.

Non-local Neural Networks

The authors define a generic non-local opeartion

$y_i = \frac{1}{C(x)}\sum\limits_{\forall j}f(X_i, X_j)g(X_j)$

It supports inputs of variable size, and maintains the corresponding size in the output. The \(g\) is a linear embedding and the $f=e{{\theta(X_i)}\phi(X_i)} $ is the Gaussian function, \(\phi\) and \(\theta\) are both embeddings.

The author define a non-local block as

$z_i=W_{z}y_{i}+x_{i}$

where \(y_i\) is given in Eq.(1) and “\(x_i\) “ donates a resudial connection.

Video Classification Models

The authors construct a simple 2D baseline architecture and then inflate the C2D to 3D. They use two cases of inflations, the \(3\times 3\times 3\) kernel and \(3\times 1\times 1\) kernel.

Experiments

The authors investigate their models on different datasets, including Kinetics, Charades and COCO. The results indicate the high efficiency.

posted on 2025-02-18 07:59  bnbncch  阅读(17)  评论(0)    收藏  举报