01 感知:目标检测

1. BEV感知

BEV Camera

view transformation

  • 2d -> 3d via depth estimation
  • 3d -> 2d (originates in 3d space)
  • pure network based (implicitly)

BEV Lidar

voxelization and 3d convs

  1. VoxelNet 2018(voxelization -> 3d convs -> flatten height dim -> RPN)

  2. SECOND 2018(Sparse 3d convs)

No 3d convs (faster)

PointNet 2016(no voxels, using mlp encode pts features)

  1. PointPillars 2019(Voxelization with pillars as BEV)

BEV Fusion

  1. BEVFusion MIT(Efficient pv -> bev transformation)

temporal fusion

  1. BEVDet4D 2022(Spatial alignment; concatenation of multiple feature map)

  2. BEVFormer 2022(adopt a soft way to fusion temporal information)\(\star\)

1.1 dense bev feature

  1. LSS 2020(First Depth Distribution)

  2. BEVDet 2022(BEV space data augmentation)

  3. BEVDepth 2022(Depth Correction)

  4. BEVFusion 2022(Fusion on BEV from Camera and LiDAR)

Cam2BEV 2020(Homo-graphic Projection to BEV)

1.2 Attention Mechanism

  1. DETR

  2. DAB-DETR (收敛慢因为: 没有提供位置先验的 learnable queries)

  3. DN-DETR (收敛慢因为:匈牙利匹配的离散性和模型训练的随机性,导致了 query 对 gt 的匹配变成了一个动态的、不稳定的过程)

  4. Deformable DETR

  5. DETR3D

  6. PETR 2022(Implicit BEV Pos Embed)

  7. BEVFormer 2022(Transformer for BEV feature)

Sparse3D

Recipe in practice

数据增强

posted @ 2025-02-26 16:54  ldfm  阅读(43)  评论(0)    收藏  举报