摘要: CLIP (Feb 2021) 之前的多模态 [Submitted on 6 Aug 2019] ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks [Sub 阅读全文
posted @ 2021-10-05 18:21 郝壹贰叁 阅读(451) 评论(0) 推荐(0)