cv方向
作者:刘佳恒
转载: (99+ 封私信 / 80 条消息) 2022 年,计算机视觉领域有哪些推荐的研究方向? - 知乎 (zhihu.com)
目前想到的有如下方向,排名不分先后,欢迎大家讨论。
- 自监督:以MAE,BEiT为代表的基于MASK方式的自监督训练策略在分类任务上取得明显提升。如何继续提升基于MASK的自监督性能和效率,以及如何把MAE相关工作拓展到其他任务上(比如检测,分割)应该后续会有不少工作。
- 多模态:CLIP证明了多模态预训练模型在多个任务上的显著的性能优势。后续基于CLIP出现了不少多模态或者视觉预训练模型,如何提升多模态预训练的性能和效率目前受到广泛关注。同时,如何把多模态预训练模型应用的下游任务也是不错的研究方向,现在已经有把CLIP应用到detection, segmentation,caption,VLN等。
- 3D:元宇宙(AR/VR)和自动驾驶最近非常火。与元宇宙相关的研究方向比如NERF,数字人等(比如talking face)。与自动驾驶相关的研究方向比如基于点云的检测/跟踪,点云+RGB多模态融合等
- 安全:模型鲁棒性,对抗攻击,防御等
Improved CLIP
RegionCLIP: RegionCLIP: Region-based Language-Image Pretraining
ZeroVL: ZeroVL: A Strong Baseline for Aligning Vision-Language Representations with Limited Resources
CLIP+[downstream tasks](https://www.zhihu.com/search?q=downstream tasks&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType"%3A"answer"%2C"sourceId"%3A2257494456}):
CLIP+seg+det: DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
CLIP+cap: ClipCap: CLIP Prefix for Image Captioning
CLIP+refer seg: CRIS: CLIP-Driven Referring Image Segmentation
CLIP+style: StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
CLIP+nerf: Zero-Shot Text-Guided Object Generation with Dream Fields
CLIP+[open vocabulary](https://www.zhihu.com/search?q=open vocabulary&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType"%3A"answer"%2C"sourceId"%3A2257494456}): OPEN-VOCABULARY OBJECT DETECTION VIA VISION AND LANGUAGE KNOWLEDGE DISTILLATION Towards Open Vocabulary Object Detection without Human-provided Bounding Boxes
CLIP+[point cloud](https://www.zhihu.com/search?q=point cloud&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType"%3A"answer"%2C"sourceId"%3A2257494456}): PointCLIP: Point Cloud Understanding by CLIP
CLIP+grounding: Grounded Language-Image Pre-training
CLIP+adapter: Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling
CLIP+video: Prompting Visual-Language Models for Efficient Video Understanding
CLIP+lite: CLIP-Lite: Information Efficient Visual Representation Learning from Textual Annotations
后续持续更新,欢迎大家贡献
浙公网安备 33010602011771号