随笔分类 - 2 多模态模型
摘要:目录SAIL-Embedding Technical Report: Omni-modal Embedding Foundation ModelTL;DRDataRecommendation-aware Data ConstructionDynamic Hard Negative MiningQ:动
阅读全文
摘要:目录VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual DocumentsTL;DRMethodQ:VLM2Vec-V2与原始VLM2Vec算法有什么区别?BenchmarkQ&AQ:CLS, QA, R
阅读全文
摘要:目录VLM2VEC: TRAINING VISION-LANGUAGE MODELS FOR MASSIVE MULTIMODAL EMBEDDING TASKSTL;DRMethodDatasetExperimentQ&AQ:VLM2Vec与普通VLM有什么区别?难道仅仅是会将embedding存
阅读全文
[PaperReading] Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution
摘要:目录Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any ResolutionTL;DRMethodNaive Dynamic ResolutionMultimodal Rotary Position E
阅读全文
摘要:目录Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondTL;DRMethodPretrainingMultiTask PreTrainingSuper
阅读全文
摘要:目录DINOv3TL;DRMethodDataArchitectureLearning ObjectiveGram Anchoring ObjectiveLeveraging Higher-Resolution Featurespost-hoc strategiesExperiment相关链接 DI
阅读全文
摘要:目录LLaVA: Visual Instruction TuningTL;DRDataScienceQA多模态测试集Method多轮对话Experiment效果可视化总结与思考相关链接 LLaVA: Visual Instruction Tuning link 时间:23.12 单位:Univers
阅读全文
摘要:目录Flamingo: a Visual Language Model for Few-Shot LearningTL;DRMethodVisual processing and Perceiver ResamplerGATED XATTN-DENSE layersMixture of Vision
阅读全文
摘要:目录R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcement LearningTL;DRMethodVerifiable RewardRLVRExperiment总结与思考相关链接 R1-Omni: Exp
阅读全文
摘要:目录BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and GenerationTL;DRMethod预训练DataFilt数据ImplementationExperi
阅读全文
摘要:目录简介TL;DRMethodDatasetExperiment总结与思考 简介 LXMERT: Learning Cross-Modality Encoder Representations from Transformers 时间:2019.08(EMNLP 2019) 单位:UNC Chape
阅读全文
摘要:目录简介TL;DRMethod核心创新点学习方式Experiment 简介 link 时间:2019.08.06 单位:Georgia Institute of Technology, Facebook AI Research, Oregon State University 相关领域:计算机视觉与
阅读全文
摘要:Learning Transferable Visual Models From Natural Language Supervision link CLIP 全称 Contrastive Language-Image Pre-training 时间:21.02 机构:OpenAI TL;DR 一种
阅读全文