cv方向

作者：刘佳恒

转载： (99+ 封私信 / 80 条消息) 2022 年，计算机视觉领域有哪些推荐的研究方向? - 知乎 (zhihu.com)

目前想到的有如下方向，排名不分先后，欢迎大家讨论。

自监督：以MAE，BEiT为代表的基于MASK方式的自监督训练策略在分类任务上取得明显提升。如何继续提升基于MASK的自监督性能和效率，以及如何把MAE相关工作拓展到其他任务上（比如检测，分割）应该后续会有不少工作。
多模态：CLIP证明了多模态预训练模型在多个任务上的显著的性能优势。后续基于CLIP出现了不少多模态或者视觉预训练模型，如何提升多模态预训练的性能和效率目前受到广泛关注。同时，如何把多模态预训练模型应用的下游任务也是不错的研究方向，现在已经有把CLIP应用到detection, segmentation，caption，VLN等。
3D：元宇宙（AR/VR）和自动驾驶最近非常火。与元宇宙相关的研究方向比如NERF，数字人等（比如talking face）。与自动驾驶相关的研究方向比如基于点云的检测/跟踪，点云+RGB多模态融合等
安全：模型鲁棒性，对抗攻击，防御等

Improved CLIP

RegionCLIP: RegionCLIP: Region-based Language-Image Pretraining

ZeroVL: ZeroVL: A Strong Baseline for Aligning Vision-Language Representations with Limited Resources

CLIP+[downstream tasks](https://www.zhihu.com/search?q=downstream tasks&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType"%3A"answer"%2C"sourceId"%3A2257494456})：

CLIP+seg+det: DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

CLIP+cap: ClipCap: CLIP Prefix for Image Captioning

CLIP+refer seg: CRIS: CLIP-Driven Referring Image Segmentation

CLIP+style: StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery

CLIP+nerf: Zero-Shot Text-Guided Object Generation with Dream Fields

CLIP+[open vocabulary](https://www.zhihu.com/search?q=open vocabulary&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType"%3A"answer"%2C"sourceId"%3A2257494456}): OPEN-VOCABULARY OBJECT DETECTION VIA VISION AND LANGUAGE KNOWLEDGE DISTILLATION Towards Open Vocabulary Object Detection without Human-provided Bounding Boxes

CLIP+[point cloud](https://www.zhihu.com/search?q=point cloud&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType"%3A"answer"%2C"sourceId"%3A2257494456}): PointCLIP: Point Cloud Understanding by CLIP

CLIP+grounding: Grounded Language-Image Pre-training

CLIP+adapter: Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling

CLIP+video: Prompting Visual-Language Models for Efficient Video Understanding

CLIP+lite: CLIP-Lite: Information Efficient Visual Representation Learning from Textual Annotations

后续持续更新，欢迎大家贡献

posted @ 2023-06-08 21:30 tourbillon007 阅读(59) 评论(0) 收藏举报

刷新页面返回顶部

cv方向

公告