自我博弈偏好优化(Self-Play Preference Optimization,SPO)能否奖励模型?
posted @ 2025-08-22 11:07 limingqi 阅读(82) 评论(0) 推荐(0)
posted @ 2025-08-22 11:07 limingqi 阅读(82) 评论(0) 推荐(0)
posted @ 2025-08-21 10:38 limingqi 阅读(53) 评论(0) 推荐(0)
posted @ 2025-08-19 10:36 limingqi 阅读(30) 评论(0) 推荐(0)
posted @ 2025-08-19 10:20 limingqi 阅读(133) 评论(0) 推荐(0)
posted @ 2025-08-18 11:56 limingqi 阅读(88) 评论(0) 推荐(0)
posted @ 2025-08-18 11:42 limingqi 阅读(415) 评论(0) 推荐(0)
posted @ 2025-08-14 16:37 limingqi 阅读(159) 评论(0) 推荐(0)
posted @ 2025-08-13 16:35 limingqi 阅读(200) 评论(0) 推荐(0)
posted @ 2025-08-09 09:11 limingqi 阅读(404) 评论(0) 推荐(0)
posted @ 2025-08-09 08:58 limingqi 阅读(88) 评论(0) 推荐(0)