[置顶] 自我博弈偏好优化(Self-Play Preference Optimization,SPO)能否奖励模型?
posted @ 2025-08-22 11:07 limingqi 阅读(172) 评论(0) 推荐(0)
posted @ 2025-08-22 11:07 limingqi 阅读(172) 评论(0) 推荐(0)
posted @ 2025-07-26 12:48 limingqi 阅读(63) 评论(0) 推荐(0)
posted @ 2025-07-26 12:47 limingqi 阅读(119) 评论(0) 推荐(0)
posted @ 2026-03-20 18:11 limingqi 阅读(3) 评论(0) 推荐(0)
posted @ 2026-03-18 16:11 limingqi 阅读(4) 评论(0) 推荐(0)
posted @ 2026-03-18 15:40 limingqi 阅读(4) 评论(0) 推荐(0)
posted @ 2026-03-18 15:25 limingqi 阅读(5) 评论(0) 推荐(0)
posted @ 2026-03-18 13:54 limingqi 阅读(1) 评论(0) 推荐(0)
posted @ 2026-03-18 13:49 limingqi 阅读(1) 评论(0) 推荐(0)
posted @ 2026-03-18 10:57 limingqi 阅读(2) 评论(0) 推荐(0)
posted @ 2026-03-09 11:36 limingqi 阅读(16) 评论(0) 推荐(0)
posted @ 2026-03-09 11:33 limingqi 阅读(12) 评论(0) 推荐(0)
posted @ 2026-03-09 11:15 limingqi 阅读(11) 评论(0) 推荐(0)