开源多模态语言模型调研记录20250713 - Antiqueeeee

公告

开源多模态语言模型调研记录20250713

HumanOmniV2
阿里开源多模态推理模型HumanOmniV2，引入强制上下文总结机制、多维度奖励体系和GRPO。
论文地址：https://arxiv.org/pdf/2506.21277
仓库地址：GitHub：https://github.com/HumanMLLM/HumanOmniV2
魔搭社区：https://modelscope.cn/models/iic/humanomniv2
Hugging Face：https://huggingface.co/PhilipC/HumanOmniV2，
IntentBench评测基准：https://huggingface.co/datasets/PhilipC/IntentBench
Kimi-VL-A3B-Thinking-2506
月之暗面开源多模态推理模型Kimi-VL-A3B-Thinking-2506，总参数量16.4B，激活3B，基于 Kimi-VL-A3B-Instruct 微调得到，支持输入达到3.2百万像素的图片（接近2k），可使用VLLM进行推理
Hugging Face: https://huggingface.co/moonshotai/Kimi-VL-A3B-Thinking-2506
仓库地址： https://github.com/MoonshotAI/Kimi-VL
报告地址： https://arxiv.org/abs/2504.07491
MiniCPM4
Hugging Face: https://huggingface.co/collections/openbmb/minicpm4-6841ab29d180257e940baa9b
仓库地址：https://github.com/openbmb/minicpm；
技术报告：https://github.com/OpenBMB/MiniCPM/blob/main/report/MiniCPM_4_Technical_Report.pdf
Gemma 3n、MedGemma、SignGemma
谷歌开源轻量多模态AI模型Gemma 3n，医疗领域专用模型MedGemma，以及手语翻译模型SignGemma
Hugging Face: https://huggingface.co/collections/google/
Seed1.5-VL
字节的多模态模型Seed1.5-VL，带思考过程，论文中的case比较有代表性
Hugging Face： https://huggingface.co/spaces/ByteDance-Seed/Seed1.5-VL
技术报告：https://arxiv.org/pdf/2505.07062
在线体验地址：https://www.volcengine.com/experience/ark?model=doubao-1-5-thinking-vision-pro-250428
Qwen2.5-VL-72B
Hugging Face: https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct
技术报告： https://qwenlm.github.io/blog/qwen2.5-vl/
仓库地址： https://github.com/QwenLM/Qwen2.5-VL
InternVL3 78B/38B
Hugging Face: https://huggingface.co/OpenGVLab/InternVL3-78B
技术报告： https://internvl.github.io/blog/
MiniMind-V
以 26M 参数的超轻量设计，提供从数据处理到指令微调的全流程代码，开源VLM模型的最小实现，入门视觉语言模型的简明教程。仅 26M 参数（0.026B），约为 GPT-3 的 1/7000，单卡 3090 即可训练。支持单图和多图输入，结合文本进行对话。包含数据处理、预训练、SFT 和推理完整代码，支持数据集清洗和自定义配置。采用简单线性变换，将 CLIP 的 768 维视觉 token 对齐到 LLM 空间。提供 OpenAI 兼容 API，可接入 FastGPT、OpenWebUI 等。
仓库地址： https://github.com/jingyaogong/minimind-v

（各模型特性待补充...）

posted on 2025-07-13 20:16 Antiqueeeee 阅读(51) 评论(0) 收藏举报

刷新页面返回顶部

antiqueeeee

公告