zikele

zikele

人生如此自可乐

模型揭示了應快取的內容:視頻擴散模型的基於剖析的特徵重用

2504.03140v2

中文标题#

模型揭示了應緩存的內容:視頻擴散模型的基於剖析的特徵重用

英文标题#

Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models

中文摘要#

擴散模型的最新進展在視頻生成方面展示了顯著的能力。 然而,計算強度仍然是實際應用中的重大挑戰。 雖然特徵緩存已被提出以減少擴散模型的計算負擔,但現有方法通常忽略了各個塊的異質重要性,導致次優重用和輸出質量下降。 為此,我們通過引入 ProfilingDiT,一種新穎的自適應緩存策略,解決了這一差距,該策略明確地解耦了前景和背景關注的塊。 通過對擴散模型中注意力分佈的系統分析,我們發現了一個關鍵觀察:1)大多數層對前景或背景區域表現出一致的偏好。 2)預測的噪聲在初始階段表現出低跨步相似性,隨著去噪過程的進行,這種相似性趨於穩定。 這一發現啟發我們制定了一種選擇性緩存策略,保留對動態前景元素的完整計算,同時高效地緩存靜態背景特徵。 我們的方法顯著降低了計算開銷,同時保持了視覺保真度。 大量實驗表明,我們的框架在保持全面質量指標下的視覺保真度的同時實現了顯著加速(例如,Wan2.1 的加速比為 2.01 倍),確立了一種高效的視頻生成可行方法。

英文摘要#

Recent advances in diffusion models have demonstrated remarkable capabilities in video generation. However, the computational intensity remains a significant challenge for practical applications. While feature caching has been proposed to reduce the computational burden of diffusion models, existing methods typically overlook the heterogeneous significance of individual blocks, resulting in suboptimal reuse and degraded output quality. To this end, we address this gap by introducing ProfilingDiT, a novel adaptive caching strategy that explicitly disentangles foreground and background-focused blocks. Through a systematic analysis of attention distributions in diffusion models, we reveal a key observation: 1) Most layers exhibit a consistent preference for either foreground or background regions. 2) Predicted noise shows low inter-step similarity initially, which stabilizes as denoising progresses. This finding inspires us to formulate a selective caching strategy that preserves full computation for dynamic foreground elements while efficiently caching static background features. Our approach substantially reduces computational overhead while preserving visual fidelity. Extensive experiments demonstrate that our framework achieves significant acceleration (e.g., 2.01 times speedup for Wan2.1) while maintaining visual fidelity across comprehensive quality metrics, establishing a viable method for efficient video generation.

PDF 獲取#

查看中文 PDF - 2504.03140v2

智能達人抖店二維碼

抖音掃碼查看更多精彩內容

載入中......
此文章數據所有權由區塊鏈加密技術和智能合約保障僅歸創作者所有。