01 感知：目标检测

1. BEV感知

BEV Camera

view transformation

2d -> 3d via depth estimation
3d -> 2d (originates in 3d space)
pure network based (implicitly)

BEV Lidar

voxelization and 3d convs

VoxelNet 2018(voxelization -> 3d convs -> flatten height dim -> RPN)
SECOND 2018(Sparse 3d convs)

No 3d convs (faster)

PointNet 2016(no voxels, using mlp encode pts features)

PointPillars 2019(Voxelization with pillars as BEV)

BEV Fusion

BEVFusion MIT(Efficient pv -> bev transformation)

temporal fusion

BEVDet4D 2022(Spatial alignment; concatenation of multiple feature map)
BEVFormer 2022(adopt a soft way to fusion temporal information)\(\star\)

1.1 dense bev feature

LSS 2020(First Depth Distribution)
BEVDet 2022(BEV space data augmentation)
BEVDepth 2022(Depth Correction)
BEVFusion 2022(Fusion on BEV from Camera and LiDAR)

Cam2BEV 2020(Homo-graphic Projection to BEV)

1.2 Attention Mechanism

DETR
DAB-DETR (收敛慢因为：没有提供位置先验的 learnable queries)
DN-DETR (收敛慢因为：匈牙利匹配的离散性和模型训练的随机性，导致了 query 对 gt 的匹配变成了一个动态的、不稳定的过程)
Deformable DETR
DETR3D
PETR 2022(Implicit BEV Pos Embed)
BEVFormer 2022(Transformer for BEV feature)

Sparse3D

Recipe in practice

数据增强

posted @ 2025-02-26 16:54 ldfm 阅读(51) 评论(0) 收藏举报

刷新页面返回顶部