2025 年 10月 20 日随笔档案 - qlhh

2025年10月20日

摘要：首先看一下KL的基础公式 KL KL1: 大模型的KL一般是反向的： \[KL(\pi_\theta||\pi_{ref}) = E_{x\sim\pi_\theta(\cdot|o_{<t})}log\frac{\pi_\theta(x|o_{<t})}{\pi_{ref}(x|o_{<t})} 阅读全文

posted @ 2025-10-20 17:02 qlhh 阅读(422) 评论(0) 推荐(1)

qlhh

公告