Transformerに基づく補助損失は、年齢を超えた顔認識に使用されます

2412.02198v3

日本語タイトル#

トランスフォーマーベースの補助損失を用いた年齢変化における顔認識

英文タイトル#

Transformer-Based Auxiliary Loss for Face Recognition Across Age Variations

日本語摘要#

老化は顔認識において重大な課題であり、皮膚の質感やトーンの変化が時間とともに顔の特徴を変えるため、長期間にわたって撮影された同一人物の画像を比較することが特に困難になります。トランスフォーマーネットワークは、老化効果によって引き起こされる順序的空間関係を保持する強みがあります。本論文では、顔認識分野においてトランスフォーマーネットワークを追加損失として使用する損失評価技術を提案します。標準的なメトリック損失関数は通常、主 CNN バックボーンの最終埋め込みを入力として受け取ります。ここでは、トランスフォーマー損失とメトリック損失を統合したトランスフォーマー・メトリック損失を採用します。本研究は、CNN の結果が順序ベクトルに配置されるときのトランスフォーマーの挙動を分析することを目的としています。これらの順序ベクトルは、老化の影響を受けたしわやたるんだ肌と呼ばれる質感や領域構造を克服する可能性があります。トランスフォーマーエンコーダーは、ネットワークの最終畳み込み層から得られた文脈ベクトルを入力として受け取ります。学習された特徴は、年齢に依存しないものになり、標準的なメトリック損失埋め込みの識別能力を補完します。この技術を用いて、さまざまな基礎メトリック損失関数を持つトランスフォーマー損失を使用して、組み合わせ損失関数の効果を評価します。このような構成により、ネットワークは LFW および年齢変化データセット（CA-LFW および AgeDB）で最先端の結果を達成できることが観察されました。この研究は、機械視覚分野におけるトランスフォーマーの役割を拡大し、損失関数としてのトランスフォーマーを探求する新たな可能性を開きます。

英文摘要#

Aging presents a significant challenge in face recognition, as changes in skin texture and tone can alter facial features over time, making it particularly difficult to compare images of the same individual taken years apart, such as in long-term identification scenarios. Transformer networks have the strength to preserve sequential spatial relationships caused by aging effect. This paper presents a technique for loss evaluation that uses a transformer network as an additive loss in the face recognition domain. The standard metric loss function typically takes the final embedding of the main CNN backbone as its input. Here, we employ a transformer-metric loss, a combined approach that integrates both transformer-loss and metric-loss. This research intends to analyze the transformer behavior on the convolution output when the CNN outcome is arranged in a sequential vector. These sequential vectors have the potential to overcome the texture or regional structure referred to as wrinkles or sagging skin affected by aging. The transformer encoder takes input from the contextual vectors obtained from the final convolution layer of the network. The learned features can be more age-invariant, complementing the discriminative power of the standard metric loss embedding. With this technique, we use transformer loss with various base metric-loss functions to evaluate the effect of the combined loss functions. We observe that such a configuration allows the network to achieve SoTA results in LFW and age-variant datasets (CA-LFW and AgeDB). This research expands the role of transformers in the machine vision domain and opens new possibilities for exploring transformers as a loss function.

PDF 获取#

查看日本語 PDF - 2412.02198v3

スマート達人の QR コード

抖音扫码查看更多精彩内容