About R1-Omni

Alibaba just dropped R1-Omni! Redefining emotional intelligence with Omni-Multimodal Emotion Recognition and Reinforcement Learning!

Introduction

R1-Omni is the industry’s first application of Reinforcement Learning with Verifiable Reward (RLVR) to an Omni-multimodal large language model. We focus on emotion recognition, a task where both visual and audio modalities play crucial roles, to validate the potential of combining RLVR with the Omni model. Our findings reveal several key insights:

Key Insights

  • Enhanced Reasoning Capability: R1-Omni demonstrates superior reasoning abilities, enabling a clearer understanding of how visual and audio information contribute to emotion recognition.
  • Improved Understanding Capability: Compared to SFT, RLVR significantly boosts performance on emotion recognition tasks.
  • Stronger Generalization Capability: RLVR models exhibit markedly better generalization capabilities, particularly excelling in out-of-distribution scenarios.

Performance

Below are the performance metrics on emotion recognition datasets. Symbols indicate whether the data is in-distribution (⬤) or out-of-distribution (△).

Note: This is an unofficial about page for R1-Omni. For the most accurate information, please refer to official documentation.