[置顶] 自我博弈偏好优化(Self-Play Preference Optimization,SPO)能否奖励模型?
posted @ 2025-08-22 11:07 limingqi 阅读(170) 评论(0) 推荐(0)
posted @ 2025-08-22 11:07 limingqi 阅读(170) 评论(0) 推荐(0)
posted @ 2025-07-26 12:48 limingqi 阅读(62) 评论(0) 推荐(0)
posted @ 2025-07-26 12:47 limingqi 阅读(119) 评论(0) 推荐(0)
posted @ 2026-03-18 16:11 limingqi 阅读(3) 评论(0) 推荐(0)
posted @ 2026-03-18 15:40 limingqi 阅读(4) 评论(0) 推荐(0)
posted @ 2026-03-18 15:25 limingqi 阅读(3) 评论(0) 推荐(0)
posted @ 2026-03-18 13:54 limingqi 阅读(1) 评论(0) 推荐(0)
posted @ 2026-03-18 13:49 limingqi 阅读(1) 评论(0) 推荐(0)
posted @ 2026-03-18 10:57 limingqi 阅读(1) 评论(0) 推荐(0)
posted @ 2026-03-09 11:36 limingqi 阅读(16) 评论(0) 推荐(0)
posted @ 2026-03-09 11:33 limingqi 阅读(10) 评论(0) 推荐(0)
posted @ 2026-03-09 11:15 limingqi 阅读(11) 评论(0) 推荐(0)
posted @ 2026-03-09 11:12 limingqi 阅读(5) 评论(0) 推荐(0)