[置顶] 自我博弈偏好优化(Self-Play Preference Optimization,SPO)能否奖励模型?
posted @ 2025-08-22 11:07 limingqi 阅读(82) 评论(0) 推荐(0)
posted @ 2025-08-22 11:07 limingqi 阅读(82) 评论(0) 推荐(0)
posted @ 2025-07-26 12:48 limingqi 阅读(49) 评论(0) 推荐(0)
posted @ 2025-07-26 12:47 limingqi 阅读(87) 评论(0) 推荐(0)
posted @ 2025-12-24 14:01 limingqi 阅读(386) 评论(2) 推荐(7)
posted @ 2025-10-24 15:28 limingqi 阅读(63) 评论(0) 推荐(0)
posted @ 2025-10-24 15:24 limingqi 阅读(38) 评论(0) 推荐(0)
posted @ 2025-10-22 23:00 limingqi 阅读(31) 评论(0) 推荐(0)
posted @ 2025-10-22 23:00 limingqi 阅读(49) 评论(0) 推荐(0)
posted @ 2025-10-18 20:10 limingqi 阅读(67) 评论(0) 推荐(0)
posted @ 2025-10-17 20:58 limingqi 阅读(42) 评论(0) 推荐(0)
posted @ 2025-10-11 11:01 limingqi 阅读(103) 评论(0) 推荐(0)
posted @ 2025-09-11 16:30 limingqi 阅读(871) 评论(0) 推荐(0)
posted @ 2025-09-10 11:20 limingqi 阅读(20) 评论(0) 推荐(0)