基於Transformer的輔助損失用於跨年齡的人臉識別

2412.02198v3

中文标题#

基於 Transformer 的輔助損失用於跨年齡的人臉識別

英文标题#

Transformer-Based Auxiliary Loss for Face Recognition Across Age Variations

中文摘要#

衰老在人臉識別中是一個重大挑戰，因為皮膚紋理和色調的變化會隨時間改變面部特徵，使得比較多年間拍攝的同一個體圖像變得特別困難，例如在長期識別場景中。 Transformer 網絡具有保持由衰老效應引起的順序空間關係的優勢。本文提出了一種損失評估技術，該技術在人臉識別領域使用 Transformer 網絡作為附加損失。標準的度量損失函數通常以主 CNN 骨幹網絡的最終嵌入作為輸入。在這裡，我們採用了一種 Transformer - 度量損失，這是一種結合了 Transformer 損失和度量損失的方法。本研究旨在分析當 CNN 結果排列成順序向量時，Transformer 在卷積輸出上的行為。這些順序向量有可能克服被稱為皺紋或鬆弛皮膚的紋理或區域結構，這些結構受衰老影響。 Transformer 編碼器從網絡最終卷積層獲得的上下文向量中獲取輸入。學習到的特徵可以更加與年齡無關，補充標準度量損失嵌入的判別能力。通過這種技術，我們使用帶有各種基礎度量損失函數的 Transformer 損失來評估組合損失函數的效果。我們觀察到，這種配置使網絡在 LFW 和年齡變化數據集（CA-LFW 和 AgeDB）上實現了最先進的結果。這項研究擴展了 Transformer 在機器視覺領域的角色，並為探索 Transformer 作為損失函數打開了新的可能性。

英文摘要#

Aging presents a significant challenge in face recognition, as changes in skin texture and tone can alter facial features over time, making it particularly difficult to compare images of the same individual taken years apart, such as in long-term identification scenarios. Transformer networks have the strength to preserve sequential spatial relationships caused by aging effect. This paper presents a technique for loss evaluation that uses a transformer network as an additive loss in the face recognition domain. The standard metric loss function typically takes the final embedding of the main CNN backbone as its input. Here, we employ a transformer-metric loss, a combined approach that integrates both transformer-loss and metric-loss. This research intends to analyze the transformer behavior on the convolution output when the CNN outcome is arranged in a sequential vector. These sequential vectors have the potential to overcome the texture or regional structure referred to as wrinkles or sagging skin affected by aging. The transformer encoder takes input from the contextual vectors obtained from the final convolution layer of the network. The learned features can be more age-invariant, complementing the discriminative power of the standard metric loss embedding. With this technique, we use transformer loss with various base metric-loss functions to evaluate the effect of the combined loss functions. We observe that such a configuration allows the network to achieve SoTA results in LFW and age-variant datasets (CA-LFW and AgeDB). This research expands the role of transformers in the machine vision domain and opens new possibilities for exploring transformers as a loss function.

PDF 获取#

查看中文 PDF - 2412.02198v3

智能達人抖店二維碼

抖音掃碼查看更多精彩內容