|
|
|
|
|
|
General Video Classification
- 3D conv
- two-stream, optical flow
- wider range
- SlowFast, multiple time scales, two pathways
- feature bank, long-term, correlated, short-term
- raw pixels, in contrast, objects within scenes
3
![image]()
- two-branch, distill
- scene, 2D, resnet, 3D, I3D
- object features: \(N_T\) objects, each \(o_t^j\) has the same dimension
3.2 Spatio-Temporal Graph
- decompose our graph into two components: the spatial graph and the temporal graph
- Spatial: normalized Intersection over Union (IoU) value, explicitly
- temporal: object transformations, semantic similarities, \(cos\)
![image]()
- imagine: # - % = $ x @ structure
|
|