摘要: 目录KIMI K1.5: SCALING REINFORCEMENT LEARNING WITH LLMSTL;DRMethodRL Prompt Set制作Long-CoT Supervised Fine-Tuning强化学习算法长度惩罚采样策略视觉数据Long2short CoT模型Model 阅读全文
posted @ 2025-07-21 20:37 fariver 阅读(139) 评论(0) 推荐(0)