cv方向

作者:刘佳恒

转载: (99+ 封私信 / 80 条消息) 2022 年,计算机视觉领域有哪些推荐的研究方向? - 知乎 (zhihu.com)

目前想到的有如下方向,排名不分先后,欢迎大家讨论。

  1. 自监督:以MAE,BEiT为代表的基于MASK方式的自监督训练策略在分类任务上取得明显提升。如何继续提升基于MASK的自监督性能和效率,以及如何把MAE相关工作拓展到其他任务上(比如检测,分割)应该后续会有不少工作。
  2. 多模态:CLIP证明了多模态预训练模型在多个任务上的显著的性能优势。后续基于CLIP出现了不少多模态或者视觉预训练模型,如何提升多模态预训练的性能和效率目前受到广泛关注。同时,如何把多模态预训练模型应用的下游任务也是不错的研究方向,现在已经有把CLIP应用到detection, segmentation,caption,VLN等。
  3. 3D:元宇宙(AR/VR)和自动驾驶最近非常火。与元宇宙相关的研究方向比如NERF,数字人等(比如talking face)。与自动驾驶相关的研究方向比如基于点云的检测/跟踪,点云+RGB多模态融合等
  4. 安全:模型鲁棒性,对抗攻击,防御等

Improved CLIP

RegionCLIP: RegionCLIP: Region-based Language-Image Pretraining

ZeroVL: ZeroVL: A Strong Baseline for Aligning Vision-Language Representations with Limited Resources

CLIP+[downstream tasks](https://www.zhihu.com/search?q=downstream tasks&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType"%3A"answer"%2C"sourceId"%3A2257494456}):

CLIP+seg+det: DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

CLIP+cap: ClipCap: CLIP Prefix for Image Captioning

CLIP+refer seg: CRIS: CLIP-Driven Referring Image Segmentation

CLIP+style: StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery

CLIP+nerf: Zero-Shot Text-Guided Object Generation with Dream Fields

CLIP+[open vocabulary](https://www.zhihu.com/search?q=open vocabulary&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType"%3A"answer"%2C"sourceId"%3A2257494456}): OPEN-VOCABULARY OBJECT DETECTION VIA VISION AND LANGUAGE KNOWLEDGE DISTILLATION Towards Open Vocabulary Object Detection without Human-provided Bounding Boxes

CLIP+[point cloud](https://www.zhihu.com/search?q=point cloud&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType"%3A"answer"%2C"sourceId"%3A2257494456}): PointCLIP: Point Cloud Understanding by CLIP

CLIP+grounding: Grounded Language-Image Pre-training

CLIP+adapter: Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling

CLIP+video: Prompting Visual-Language Models for Efficient Video Understanding

CLIP+lite: CLIP-Lite: Information Efficient Visual Representation Learning from Textual Annotations

后续持续更新,欢迎大家贡献

posted @ 2023-06-08 21:30  tourbillon007  阅读(59)  评论(0)    收藏  举报