2 Related Work

General Video Classification

  • 3D conv
  • two-stream, optical flow
  • wider range
  • SlowFast, multiple time scales, two pathways
  • feature bank, long-term, correlated, short-term
  • raw pixels, in contrast, objects within scenes

3

image

  • two-branch, distill
  • scene, 2D, resnet, 3D, I3D
  • object features: \(N_T\) objects, each \(o_t^j\) has the same dimension

3.2 Spatio-Temporal Graph

  • decompose our graph into two components: the spatial graph and the temporal graph
  • Spatial: normalized Intersection over Union (IoU) value, explicitly
  • temporal: object transformations, semantic similarities, \(cos\)
    image
  • imagine: # - % = $ x @ structure