上一页 1 2 3 4 5 6 7 8 ··· 19 下一页
摘要: 目录KIMI K2: OPEN AGENTIC INTELLIGENCETL;DRMethodQK-Clip在Transformer Attention中,什么是attention logits爆炸问题?QKClip为什么能解决attention logits爆炸的问题?AlgorithmPre-t 阅读全文
posted @ 2025-08-01 21:53 fariver 阅读(372) 评论(0) 推荐(0)
摘要: 目录Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large DatasetsTL; DR;DataStage I: Image PretrainingStage II: Curating a Video Pretr 阅读全文
posted @ 2025-07-28 22:24 fariver 阅读(111) 评论(0) 推荐(0)
摘要: 目录Flamingo: a Visual Language Model for Few-Shot LearningTL;DRMethodVisual processing and Perceiver ResamplerGATED XATTN-DENSE layersMixture of Vision 阅读全文
posted @ 2025-07-26 15:41 fariver 阅读(98) 评论(0) 推荐(0)
摘要: 引爆推理革命:从PPO到GRPO,强化学习如何重塑大语言模型 引言:当强化学习遇上大型语言模型 近年来,大型语言模型(LLM)以前所未有的速度席卷了人工智能领域。然而,预训练的LLM虽然知识渊博,但其输出往往难以完全符合人类的价值观和特定任务的需求。 为了解决这一“对齐”难题,一种新的技术范式——基 阅读全文
posted @ 2025-07-22 21:44 fariver 阅读(520) 评论(0) 推荐(0)
摘要: 目录KIMI K1.5: SCALING REINFORCEMENT LEARNING WITH LLMSTL;DRMethodRL Prompt Set制作Long-CoT Supervised Fine-Tuning强化学习算法长度惩罚采样策略视觉数据Long2short CoT模型Model 阅读全文
posted @ 2025-07-21 20:37 fariver 阅读(147) 评论(0) 推荐(0)
摘要: 目录DAPO: An Open-Source LLM Reinforcement Learning System at ScaleTL;DRBackgroundMethodClip-HigherDynamic SamplingOverlong Reward ShapingExperiment总结与思 阅读全文
posted @ 2025-07-20 18:58 fariver 阅读(72) 评论(0) 推荐(0)
摘要: 目录QWENLONG-L1: Towards Long-Context Large Reasoning Models with Reinforcement LearningTL;DRMotivationsuboptimal training efficiencyunstable optimizati 阅读全文
posted @ 2025-07-20 15:07 fariver 阅读(35) 评论(0) 推荐(0)
摘要: 目录Training language models to follow instructions with human feedbackTL;DRMethodDatasetModelSupervised fine-tuningReward modeling(RM)Reinforcement Lea 阅读全文
posted @ 2025-07-17 21:58 fariver 阅读(124) 评论(0) 推荐(0)
摘要: 目录R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcement LearningTL;DRMethodVerifiable RewardRLVRExperiment总结与思考相关链接 R1-Omni: Exp 阅读全文
posted @ 2025-07-15 21:28 fariver 阅读(53) 评论(0) 推荐(0)
摘要: 目录DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement LearningTL;DRMethodExperiment总结与思考相关链接 DeepSeek-R1: Incentivizing Reasonin 阅读全文
posted @ 2025-07-15 20:28 fariver 阅读(49) 评论(0) 推荐(0)
上一页 1 2 3 4 5 6 7 8 ··· 19 下一页