[置顶] 自我博弈偏好优化(Self-Play Preference Optimization,SPO)能否奖励模型?
posted @ 2025-08-22 11:07 limingqi 阅读(125) 评论(0) 推荐(0)
posted @ 2025-08-22 11:07 limingqi 阅读(125) 评论(0) 推荐(0)
posted @ 2025-07-26 12:48 limingqi 阅读(61) 评论(0) 推荐(0)
posted @ 2025-07-26 12:47 limingqi 阅读(117) 评论(0) 推荐(0)
posted @ 2026-03-09 11:36 limingqi 阅读(11) 评论(0) 推荐(0)
posted @ 2026-03-09 11:33 limingqi 阅读(4) 评论(0) 推荐(0)
posted @ 2026-03-09 11:15 limingqi 阅读(6) 评论(0) 推荐(0)
posted @ 2026-03-09 11:12 limingqi 阅读(3) 评论(0) 推荐(0)
posted @ 2026-03-09 11:06 limingqi 阅读(6) 评论(0) 推荐(0)
posted @ 2026-03-09 10:57 limingqi 阅读(6) 评论(0) 推荐(0)
posted @ 2026-03-09 10:46 limingqi 阅读(13) 评论(0) 推荐(0)
posted @ 2026-02-28 13:38 limingqi 阅读(51) 评论(0) 推荐(0)
posted @ 2026-02-06 16:49 limingqi 阅读(8) 评论(0) 推荐(0)