Fork me on GitHub

Coding Poineer

Coding Poineer

Coding Poineer

Coding Poineer

Coding Poineer

Coding Poineer

Coding Poineer

Coding Poineer

Coding Poineer

Coding Poineer

Coding Poineer

reward model相关技术

Reward Hacking 模型通过利用奖励系统的设计缺陷或漏洞,采取非预期的行为来获取高额奖励,而不是真正实现设计者期望的目标

字节token https://mp.weixin.qq.com/s/lsCshrnmtO-bYaszLFBSNw

DeepSeek训练图解:https://zhuanlan.zhihu.com/p/22037101139

posted @ 2025-02-10 10:45  365/24/60  阅读(56)  评论(0)    收藏  举报