[置顶] 自我博弈偏好优化(Self-Play Preference Optimization,SPO)能否奖励模型?
posted @ 2025-08-22 11:07 limingqi 阅读(94) 评论(0) 推荐(0)
posted @ 2025-08-22 11:07 limingqi 阅读(94) 评论(0) 推荐(0)
posted @ 2025-07-26 12:48 limingqi 阅读(55) 评论(0) 推荐(0)
posted @ 2025-07-26 12:47 limingqi 阅读(97) 评论(0) 推荐(0)
posted @ 2026-01-15 17:23 limingqi 阅读(4) 评论(0) 推荐(0)
posted @ 2026-01-15 09:45 limingqi 阅读(11) 评论(0) 推荐(0)
posted @ 2026-01-14 10:42 limingqi 阅读(7) 评论(0) 推荐(0)
posted @ 2025-12-24 14:01 limingqi 阅读(1439) 评论(6) 推荐(22)
posted @ 2025-10-24 15:28 limingqi 阅读(148) 评论(0) 推荐(1)
posted @ 2025-10-24 15:24 limingqi 阅读(52) 评论(0) 推荐(0)
posted @ 2025-10-22 23:00 limingqi 阅读(52) 评论(0) 推荐(0)
posted @ 2025-10-22 23:00 limingqi 阅读(73) 评论(0) 推荐(0)
posted @ 2025-10-18 20:10 limingqi 阅读(95) 评论(0) 推荐(0)
posted @ 2025-10-17 20:58 limingqi 阅读(57) 评论(0) 推荐(0)