大型言語モデル（LLM）駆動アプリケーションの文脈において自動音声認識（ASR）モデルの性能を測定する方法

2507.16456v1

日本語タイトル#

大規模言語モデル（LLM）駆動アプリケーションの文脈における自動音声認識（ASR）モデルの性能を測定するアプローチ

英文タイトル#

An approach to measuring the performance of Automatic Speech Recognition (ASR) models in the context of Large Language Model (LLM) powered applications

日本語要約#

自動音声認識（ASR）は人間と機械の相互作用において重要な役割を果たし、さまざまなアプリケーションのインターフェースとして機能します。従来、ASR の性能は生成された転写文における挿入、削除、置換の数を定量化する指標である単語誤り率（WER）を用いて評価されてきました。しかし、大規模で強力な大規模言語モデル（LLMs）がさまざまなアプリケーションのコア処理コンポーネントとしてますます普及する中で、下流タスクにおけるさまざまなタイプの ASR エラーの重要性はさらなる探求に値します。本研究では、LLMs が ASR によって導入されたエラーを修正する能力を分析し、LLM 駆動アプリケーションにおける ASR 性能を評価するための新しい指標を提案します。

英文要約#

Automatic Speech Recognition (ASR) plays a crucial role in human-machine interaction and serves as an interface for a wide range of applications. Traditionally, ASR performance has been evaluated using Word Error Rate (WER), a metric that quantifies the number of insertions, deletions, and substitutions in the generated transcriptions. However, with the increasing adoption of large and powerful Large Language Models (LLMs) as the core processing component in various applications, the significance of different types of ASR errors in downstream tasks warrants further exploration. In this work, we analyze the capabilities of LLMs to correct errors introduced by ASRs and propose a new measure to evaluate ASR performance for LLM-powered applications.

PDF 取得#

中文 PDF を見る - 2507.16456v1

スマート達人抖店 QR コード

抖音でスキャンしてさらに素晴らしいコンテンツを見る