zikele

zikele

人生如此自可乐

传音多语言语音识别系统用于MLC-SLM 2025挑战赛

2508.14916v1

中文标题#

传音多语言语音识别系统用于 MLC-SLM 2025 挑战赛

英文标题#

Transsion Multilingual Speech Recognition System for MLC-SLM 2025 Challenge

中文摘要#

本文介绍了由传音语音团队为 MLC-SLM 2025 挑战赛的 Track 1 开发的新型多语言自动语音识别(ASR)系统的架构和性能。 该系统包含三个关键组件:1)基于冻结的 Whisper-large-v3 的语音编码器,利用大规模预训练确保稳健的声学特征提取;2)使用 Linear-ReLU-Linear 变换机制的可训练适配模块,以有效对齐语音和文本表示;以及 3)与可训练 LoRA 集成的冻结 Qwen2.5-7B-Instruct 大语言模型(LLM),用于优化上下文语言解码。 通过系统地结合预训练模型与任务特定微调,该系统在评估集的 11 种语言中实现了 9.83% 的词 / 字符错误率(WER/CER),并在全球参与者中排名第三。

英文摘要#

This paper presents the architecture and performance of a novel Multilingual Automatic Speech Recognition (ASR) system developed by the Transsion Speech Team for Track 1 of the MLC-SLM 2025 Challenge. The proposed system comprises three key components: 1) a frozen Whisper-large-v3 based speech encoder, leveraging large-scale pretraining to ensure robust acoustic feature extraction; 2) a trainable adaptor module using Linear-ReLU-Linear transformation mechanisms to effectively align speech and text representations; and 3) a frozen Qwen2.5-7B-Instruct large language model (LLM) integrated with trainable LoRA for optimized contextual linguistic decoding. By systematically combining pretrained models with task specific fine-tuning, the system achieved a word/character error rate (WER/CER) of 9.83% across 11 languages in the evaluation set and ranked third place among global participants.

文章页面#

传音多语言语音识别系统用于 MLC-SLM 2025 挑战赛

PDF 获取#

查看中文 PDF - 2508.14916v1

智能达人抖店二维码

抖音扫码查看更多精彩内容

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.