[PaperReading] R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcement Learning

R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcement Learning

link
时间:xx.xx
单位:Tongyi Lab
作者相关工作:Jiaxing Zhao Xihan Wei Liefeng Bo
被引次数:15
github主页:
https://github.com/HumanMLLM/R1-Omni?tab=readme-ov-file

TL;DR

将RLVR成功应用于Omni多模态(视觉+音频)大模型表情识别任务,提升了该任务推理能力、识别精度、泛化性。RLVR的全称是Reinforcement Learning with Verifiable Reward。

Method

Verifiable Reward

定义为任务的结果是可被验证的,例如 mathematical problem-solving, coding challenges等易定义对错的任务。DeepSeekR1中Rule Based Reward就属于Verifiable Reward。RLVR的相对概念是RLHF,在RLHF中需要先根据人工标注的偏好训练Reward模型,再根据Reward模型来产生Reward。
表情识别任务的Reward定义:
image

RLVR

强化学习框架参考GRPO,Reward由Accuracy Reward与Format Reward,其中Accuracy Reward使用上述定义Verifiable Reward, Format Reward鼓励模型按照specified HTML格式输出。

Experiment

RLVR作用主要从实验精度与推理结果逻辑性来体现:
image
image
image

总结与思考

相关链接

https://zhuanlan.zhihu.com/p/29860130691

posted @ 2025-07-15 21:28  fariver  阅读(53)  评论(0)    收藏  举报