摘要:
BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers link 时间:22.07 机构:Nanjing University && Sha 阅读全文
摘要:
PETR: Position Embedding Transformation for Multi-View 3D Object Detection PETR: Position Embedding Transformation for Multi-View 3D Object Detection 阅读全文
摘要:
OFT Orthographic Feature Transform for Monocular 3D Object Detection OFT Orthographic Feature Transform for Monocular 3D Object Detection 时间:18.11 机构: 阅读全文
摘要:
名称 Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D 时间:20.08 机构:NVIDIA TL;DR 后融合方法将每一目感知结果通过相机参数转换到BEV空 阅读全文
摘要:
名称 KOSMOS: Language Is Not All You Need: Aligning Perception with Language Models 时间:23.05 机构:Microsoft TL;DR 一种输入多模型信息的大语言模型,作者称之为多模型大语言模型(MLLM),可以图多 阅读全文