基于人类反馈的动态语音情感识别

2508.14920v1

中文标题#

基于人类反馈的动态语音情感识别

英文标题#

Human Feedback Driven Dynamic Speech Emotion Recognition

中文摘要#

这项工作旨在探索动态语音情感识别的新领域。与传统方法不同，我们假设每个音频轨道与不同时刻活跃的一系列情感相关联。该研究特别关注情感三维化身的动画效果。我们提出了一种多阶段方法，包括经典语音情感识别模型的训练、情感序列的合成生成以及基于人类反馈的进一步模型改进。此外，我们引入了一种基于狄利克雷分布的情感混合建模新方法。这些模型是根据从三维面部动画数据集中提取的真实情感进行评估的。我们将我们的模型与滑动窗口方法进行比较。我们的实验结果表明，基于狄利克雷的方法在建模情感混合方面是有效的。结合人类反馈进一步提高了模型质量，同时提供了一个简化的标注过程。

英文摘要#

This work proposes to explore a new area of dynamic speech emotion recognition. Unlike traditional methods, we assume that each audio track is associated with a sequence of emotions active at different moments in time. The study particularly focuses on the animation of emotional 3D avatars. We propose a multi-stage method that includes the training of a classical speech emotion recognition model, synthetic generation of emotional sequences, and further model improvement based on human feedback. Additionally, we introduce a novel approach to modeling emotional mixtures based on the Dirichlet distribution. The models are evaluated based on ground-truth emotions extracted from a dataset of 3D facial animations. We compare our models against the sliding window approach. Our experimental results show the effectiveness of Dirichlet-based approach in modeling emotional mixtures. Incorporating human feedback further improves the model quality while providing a simplified annotation procedure.

文章页面#

基于人类反馈的动态语音情感识别

PDF 获取#

查看中文 PDF - 2508.14920v1

智能达人抖店二维码

抖音扫码查看更多精彩内容