随笔分类 -  大模型

摘要:Blog 代码:GitHub - JinjieNi/OpenMoE2: The official repo for "OpenMoE 2: Sparse Diffusion Language Models". OpenMoE 2 是第一个 moe+diffusion language model ( 阅读全文
posted @ 2025-10-31 10:40 jack-chen666 阅读(25) 评论(0) 推荐(0)
摘要:目录1. 基础概念2. 基础语法Level 1: 向量点积Level 2: 矩阵乘法Level 3: 批次矩阵乘法(Transformer中常用)4. PI0 代码中的实际例子例子1: QKV 投影 (gemma.py:183)例子2: 注意力计算 (gemma.py:217)例子3: 注意力输出 阅读全文
posted @ 2025-10-24 17:24 jack-chen666 阅读(82) 评论(0) 推荐(0)
摘要:目录总框架pai0PaliGemma: A versatile 3B VLM for transferTransFusion 模型介绍(多模态统一模型)VLM (Vision-Language Model) 架构 总框架 π0——用于通用机器人控制的VLA模型:一套框架控制7种机械臂(基于PaliG 阅读全文
posted @ 2025-10-09 20:30 jack-chen666 阅读(146) 评论(0) 推荐(0)
摘要:原始LLaVA论文: 标题: "Visual Instruction Tuning" arXiv链接: https://arxiv.org/abs/2304.08485 会议: NeurIPS 2023 LLaVA-1.5 论文: 标题: "Improved Baselines with Visua 阅读全文
posted @ 2025-09-15 16:47 jack-chen666 阅读(25) 评论(0) 推荐(0)
摘要:https://github.com/ByteVisionLab/TokenFlow https://arxiv.org/abs/2412.03069 阅读全文
posted @ 2025-09-15 09:52 jack-chen666 阅读(13) 评论(0) 推荐(0)
摘要:https://arxiv.org/abs/2503.09573 阅读全文
posted @ 2025-09-15 09:46 jack-chen666 阅读(20) 评论(0) 推荐(0)
摘要:https://arxiv.org/abs/2411.07975 https://github.com/deepseek-ai/Janus 阅读全文
posted @ 2025-09-15 09:43 jack-chen666 阅读(347) 评论(0) 推荐(0)
摘要:https://janusai.pro/ https://huggingface.co/deepseek-ai/Janus-Pro-7B https://arxiv.org/abs/2501.17811 https://github.com/deepseek-ai/Janus 阅读全文
posted @ 2025-09-15 09:33 jack-chen666 阅读(17) 评论(0) 推荐(0)
摘要:https://arxiv.org/pdf/2505.00703 https://github.com/CaraJ7/T2I-R1 阅读全文
posted @ 2025-09-15 09:29 jack-chen666 阅读(13) 评论(0) 推荐(0)
摘要:https://arxiv.org/abs/2509.08827 https://huggingface.co/papers/2509.08827 阅读全文
posted @ 2025-09-15 09:07 jack-chen666 阅读(38) 评论(0) 推荐(0)
摘要:The Landscape of Agentic Reinforcement__Learning for LLMs.pdf https://medium.com/data-science-in-your-pocket/the-landscape-of-agentic-reinforcement-le 阅读全文
posted @ 2025-09-15 09:06 jack-chen666 阅读(88) 评论(0) 推荐(0)
摘要:https://www.physicalintelligence.company/download/pi05.pdf https://github.com/Physical-Intelligence/openpi https://mp.weixin.qq.com/s/4FwNUULBzMrqEOm9 阅读全文
posted @ 2025-09-15 09:04 jack-chen666 阅读(52) 评论(0) 推荐(0)
摘要:https://mp.weixin.qq.com/s/fwOGuKy2Wtz_xXx3nCT28w 论文题目:LLaDA-VLA: Vision Language Diffusion Action Models 论文链接:https://arxiv.org/abs/2509.06932 项目主页:h 阅读全文
posted @ 2025-09-15 09:00 jack-chen666 阅读(53) 评论(0) 推荐(0)
摘要:目录Triton kernel — 存 KV cachePython 封装 — store_kvcacheattention Qwen3(或相似架构)中的 Attention 层实现,它结合了 Triton 自定义 kernel(KV cache 存储) 和 FlashAttention 库 来实现 阅读全文
posted @ 2025-09-04 09:58 jack-chen666 阅读(106) 评论(0) 推荐(0)
摘要:目录激活函数Qwen3MLP 激活函数 import torch from torch import nn import torch.nn.functional as F class SiluAndMul(nn.Module): def __init__(self): super().__init_ 阅读全文
posted @ 2025-09-04 09:43 jack-chen666 阅读(309) 评论(0) 推荐(0)
摘要:目录核心思想:零计算专家(Zero-Computation Experts, ZCE)二、MoE++ 架构详解ZCE 的选择策略为什么 ZCE 有效?总结:MoE++ (ZCE) 的价值 MoE++: Accelerating Mixture-of-Experts Methods with Zero 阅读全文
posted @ 2025-09-02 11:08 jack-chen666 阅读(70) 评论(0) 推荐(0)
摘要:目录主流 MoE 架构核心组件回顾-基础 MoE 结构负载均衡(Load Balancing Loss)专家并行(Expert Parallelism)层级化 MoE(Hierarchical MoE / H-MoE)动态稀疏模式(Dynamic Sparsity)残差连接与专家融合(Residua 阅读全文
posted @ 2025-09-01 14:41 jack-chen666 阅读(59) 评论(0) 推荐(0)
摘要:目录参考LoRA (Low-Rank Adaptation)OFT (Orthogonal Finetuning)数学原理总结正交矩阵的特性核心特性与几何解释举例看这个图 参考 https://huggingface.co/docs/peft/en/conceptual_guides/oft?utm 阅读全文
posted @ 2025-08-27 11:10 jack-chen666 阅读(205) 评论(0) 推荐(0)
摘要:目录MX数据结构FP32、FP16 是如何表示一个浮点数据的?FP32 (单精度浮点数)FP16 (半精度浮点数)MX数据格式 参考:https://www.cnblogs.com/cavalier-chen/p/18591085 MX数据结构 https://arxiv.org/abs/2310. 阅读全文
posted @ 2025-08-26 10:34 jack-chen666 阅读(297) 评论(0) 推荐(0)
摘要:https://arxiv.org/pdf/2503.08026? google的RMM 记忆 https://app.funblocks.net/#/aiflow?hid=8481d7c2a61775df3c75df1e533dcb8a 一句话总结:回顾过去+展望未来 阅读全文
posted @ 2025-08-25 17:07 jack-chen666 阅读(15) 评论(0) 推荐(0)