大模型 - 随笔分类 - jack-chen666

moe+diffusion language model (DLM)

摘要：Blog 代码：GitHub - JinjieNi/OpenMoE2: The official repo for "OpenMoE 2: Sparse Diffusion Language Models". OpenMoE 2 是第一个 moe+diffusion language model ( 阅读全文

posted @ 2025-10-31 10:40 jack-chen666 阅读(25) 评论(0) 推荐(0)

整体理解pai0-具身智能-PyTorch einsum 完全教程-11

摘要：目录1. 基础概念2. 基础语法Level 1: 向量点积Level 2: 矩阵乘法Level 3: 批次矩阵乘法（Transformer中常用）4. PI0 代码中的实际例子例子1: QKV 投影 (gemma.py:183)例子2: 注意力计算 (gemma.py:217)例子3: 注意力输出阅读全文

posted @ 2025-10-24 17:24 jack-chen666 阅读(83) 评论(0) 推荐(0)

整体理解pai0-具身智能-01

摘要：目录总框架pai0PaliGemma: A versatile 3B VLM for transferTransFusion 模型介绍（多模态统一模型）VLM (Vision-Language Model) 架构总框架 π0——用于通用机器人控制的VLA模型：一套框架控制7种机械臂(基于PaliG 阅读全文

posted @ 2025-10-09 20:30 jack-chen666 阅读(148) 评论(0) 推荐(0)

LLaVA- Improved Baselines with Visual Instruction Tuning

摘要：原始LLaVA论文：标题： "Visual Instruction Tuning" arXiv链接： https://arxiv.org/abs/2304.08485 会议： NeurIPS 2023 LLaVA-1.5 论文：标题： "Improved Baselines with Visua 阅读全文

posted @ 2025-09-15 16:47 jack-chen666 阅读(25) 评论(0) 推荐(0)

TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation

摘要：https://github.com/ByteVisionLab/TokenFlow https://arxiv.org/abs/2412.03069 阅读全文

posted @ 2025-09-15 09:52 jack-chen666 阅读(13) 评论(0) 推荐(0)

Block Diffusion-Interpolating Between Autoregressive and Diffusion Language Models

摘要：https://arxiv.org/abs/2503.09573 阅读全文

posted @ 2025-09-15 09:46 jack-chen666 阅读(20) 评论(0) 推荐(0)

JanusFlow-Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation

摘要：https://arxiv.org/abs/2411.07975 https://github.com/deepseek-ai/Janus 阅读全文

posted @ 2025-09-15 09:43 jack-chen666 阅读(350) 评论(0) 推荐(0)

Janus-Pro 多模态模型

摘要：https://janusai.pro/ https://huggingface.co/deepseek-ai/Janus-Pro-7B https://arxiv.org/abs/2501.17811 https://github.com/deepseek-ai/Janus 阅读全文

posted @ 2025-09-15 09:33 jack-chen666 阅读(17) 评论(0) 推荐(0)

Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

摘要：https://arxiv.org/pdf/2505.00703 https://github.com/CaraJ7/T2I-R1 阅读全文

posted @ 2025-09-15 09:29 jack-chen666 阅读(13) 评论(0) 推荐(0)

A Survey of Reinforcement Learning for Large Reasoning Models

摘要：https://arxiv.org/abs/2509.08827 https://huggingface.co/papers/2509.08827 阅读全文

posted @ 2025-09-15 09:07 jack-chen666 阅读(38) 评论(0) 推荐(0)

The Landscape of Agentic Reinforcement Learning综述

摘要：The Landscape of Agentic Reinforcement__Learning for LLMs.pdf https://medium.com/data-science-in-your-pocket/the-landscape-of-agentic-reinforcement-le 阅读全文

posted @ 2025-09-15 09:06 jack-chen666 阅读(88) 评论(0) 推荐(0)

π0.5开源

摘要：https://www.physicalintelligence.company/download/pi05.pdf https://github.com/Physical-Intelligence/openpi https://mp.weixin.qq.com/s/4FwNUULBzMrqEOm9 阅读全文

posted @ 2025-09-15 09:04 jack-chen666 阅读(52) 评论(0) 推荐(0)

新范式-LLaDA-VLA 基于扩散模型 VLA模型

摘要：https://mp.weixin.qq.com/s/fwOGuKy2Wtz_xXx3nCT28w 论文题目：LLaDA-VLA: Vision Language Diffusion Action Models 论文链接：https://arxiv.org/abs/2509.06932 项目主页：h 阅读全文

posted @ 2025-09-15 09:00 jack-chen666 阅读(54) 评论(0) 推荐(0)

大模型-Qwen3 attention层-98

摘要：目录Triton kernel — 存 KV cachePython 封装 — store_kvcacheattention Qwen3（或相似架构）中的 Attention 层实现，它结合了 Triton 自定义 kernel（KV cache 存储）和 FlashAttention 库来实现阅读全文

posted @ 2025-09-04 09:58 jack-chen666 阅读(106) 评论(0) 推荐(0)

大模型-Qwen3 MLP层-97

摘要：目录激活函数Qwen3MLP 激活函数 import torch from torch import nn import torch.nn.functional as F class SiluAndMul(nn.Module): def __init__(self): super().__init_ 阅读全文

posted @ 2025-09-04 09:43 jack-chen666 阅读(310) 评论(0) 推荐(0)

大模型- moe++-96

摘要：目录核心思想：零计算专家（Zero-Computation Experts, ZCE）二、MoE++ 架构详解ZCE 的选择策略为什么 ZCE 有效？总结：MoE++ (ZCE) 的价值 MoE++: Accelerating Mixture-of-Experts Methods with Zero 阅读全文

posted @ 2025-09-02 11:08 jack-chen666 阅读(70) 评论(0) 推荐(0)

大模型- moe技术汇总-95

摘要：目录主流 MoE 架构核心组件回顾-基础 MoE 结构负载均衡（Load Balancing Loss）专家并行（Expert Parallelism）层级化 MoE（Hierarchical MoE / H-MoE）动态稀疏模式（Dynamic Sparsity）残差连接与专家融合（Residua 阅读全文

posted @ 2025-09-01 14:41 jack-chen666 阅读(59) 评论(0) 推荐(0)

大模型- 参数微调PEFT之OFT-94

摘要：目录参考LoRA (Low-Rank Adaptation)OFT (Orthogonal Finetuning)数学原理总结正交矩阵的特性核心特性与几何解释举例看这个图参考 https://huggingface.co/docs/peft/en/conceptual_guides/oft?utm 阅读全文

posted @ 2025-08-27 11:10 jack-chen666 阅读(206) 评论(0) 推荐(0)

MXFP4 gpt-oss 使用的新的数据结构

摘要：目录MX数据结构FP32、FP16 是如何表示一个浮点数据的？FP32 (单精度浮点数)FP16 (半精度浮点数)MX数据格式参考：https://www.cnblogs.com/cavalier-chen/p/18591085 MX数据结构 https://arxiv.org/abs/2310. 阅读全文

posted @ 2025-08-26 10:34 jack-chen666 阅读(297) 评论(0) 推荐(0)

google RMM记忆

摘要：https://arxiv.org/pdf/2503.08026? google的RMM 记忆 https://app.funblocks.net/#/aiflow?hid=8481d7c2a61775df3c75df1e533dcb8a 一句话总结：回顾过去+展望未来阅读全文

posted @ 2025-08-25 17:07 jack-chen666 阅读(15) 评论(0) 推荐(0)

红豆生南国是很遥远的事情

种豆南山下 github

随笔分类 - 大模型

公告

红豆生南国 是很遥远的事情

种豆南山下 github

随笔分类 - 大模型

公告

红豆生南国是很遥远的事情