随笔分类 -  2 多模态模型

摘要:目录SAIL-Embedding Technical Report: Omni-modal Embedding Foundation ModelTL;DRDataRecommendation-aware Data ConstructionDynamic Hard Negative MiningQ:动 阅读全文
posted @ 2025-10-18 18:20 fariver 阅读(5) 评论(0) 推荐(0)
摘要:目录VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual DocumentsTL;DRMethodQ:VLM2Vec-V2与原始VLM2Vec算法有什么区别?BenchmarkQ&AQ:CLS, QA, R 阅读全文
posted @ 2025-10-17 19:50 fariver 阅读(8) 评论(0) 推荐(0)
摘要:目录VLM2VEC: TRAINING VISION-LANGUAGE MODELS FOR MASSIVE MULTIMODAL EMBEDDING TASKSTL;DRMethodDatasetExperimentQ&AQ:VLM2Vec与普通VLM有什么区别?难道仅仅是会将embedding存 阅读全文
posted @ 2025-10-16 22:37 fariver 阅读(9) 评论(0) 推荐(0)
摘要:目录Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any ResolutionTL;DRMethodNaive Dynamic ResolutionMultimodal Rotary Position E 阅读全文
posted @ 2025-09-23 18:29 fariver 阅读(44) 评论(0) 推荐(0)
摘要:目录Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondTL;DRMethodPretrainingMultiTask PreTrainingSuper 阅读全文
posted @ 2025-09-19 21:24 fariver 阅读(29) 评论(0) 推荐(0)
摘要:目录DINOv3TL;DRMethodDataArchitectureLearning ObjectiveGram Anchoring ObjectiveLeveraging Higher-Resolution Featurespost-hoc strategiesExperiment相关链接 DI 阅读全文
posted @ 2025-09-16 21:36 fariver 阅读(118) 评论(0) 推荐(0)
摘要:目录LLaVA: Visual Instruction TuningTL;DRDataScienceQA多模态测试集Method多轮对话Experiment效果可视化总结与思考相关链接 LLaVA: Visual Instruction Tuning link 时间:23.12 单位:Univers 阅读全文
posted @ 2025-08-22 22:11 fariver 阅读(27) 评论(0) 推荐(0)
摘要:目录Flamingo: a Visual Language Model for Few-Shot LearningTL;DRMethodVisual processing and Perceiver ResamplerGATED XATTN-DENSE layersMixture of Vision 阅读全文
posted @ 2025-07-26 15:41 fariver 阅读(68) 评论(0) 推荐(0)
摘要:目录R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcement LearningTL;DRMethodVerifiable RewardRLVRExperiment总结与思考相关链接 R1-Omni: Exp 阅读全文
posted @ 2025-07-15 21:28 fariver 阅读(42) 评论(0) 推荐(0)
摘要:目录BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and GenerationTL;DRMethod预训练DataFilt数据ImplementationExperi 阅读全文
posted @ 2025-05-29 21:17 fariver 阅读(45) 评论(0) 推荐(0)
摘要:目录简介TL;DRMethodDatasetExperiment总结与思考 简介 LXMERT: Learning Cross-Modality Encoder Representations from Transformers 时间:2019.08(EMNLP 2019) 单位:UNC Chape 阅读全文
posted @ 2025-05-11 13:08 fariver 阅读(47) 评论(0) 推荐(0)
摘要:目录简介TL;DRMethod核心创新点学习方式Experiment 简介 link 时间:2019.08.06 单位:Georgia Institute of Technology, Facebook AI Research, Oregon State University 相关领域:计算机视觉与 阅读全文
posted @ 2025-05-11 12:40 fariver 阅读(31) 评论(0) 推荐(0)
摘要:Learning Transferable Visual Models From Natural Language Supervision link CLIP 全称 Contrastive Language-Image Pre-training 时间:21.02 机构:OpenAI TL;DR 一种 阅读全文
posted @ 2024-03-07 00:34 fariver 阅读(123) 评论(0) 推荐(0)