zikele

zikele

人生如此自可乐

模型揭示了应缓存的内容:视频扩散模型的基于剖析的特征复用

2504.03140v2

中文标题#

模型揭示了应缓存的内容:视频扩散模型的基于剖析的特征复用

英文标题#

Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models

中文摘要#

扩散模型的最新进展在视频生成方面展示了显著的能力。 然而,计算强度仍然是实际应用中的重大挑战。 虽然特征缓存已被提出以减少扩散模型的计算负担,但现有方法通常忽略了各个块的异质重要性,导致次优重用和输出质量下降。 为此,我们通过引入 ProfilingDiT,一种新颖的自适应缓存策略,解决了这一差距,该策略明确地解耦了前景和背景关注的块。 通过对扩散模型中注意力分布的系统分析,我们发现了一个关键观察:1)大多数层对前景或背景区域表现出一致的偏好。 2)预测的噪声在初始阶段表现出低跨步骤相似性,随着去噪过程的进行,这种相似性趋于稳定。 这一发现启发我们制定了一种选择性缓存策略,保留对动态前景元素的完整计算,同时高效地缓存静态背景特征。 我们的方法显著降低了计算开销,同时保持了视觉保真度。 大量实验表明,我们的框架在保持全面质量指标下的视觉保真度的同时实现了显著加速(例如,Wan2.1 的加速比为 2.01 倍),确立了一种高效的视频生成可行方法。

英文摘要#

Recent advances in diffusion models have demonstrated remarkable capabilities in video generation. However, the computational intensity remains a significant challenge for practical applications. While feature caching has been proposed to reduce the computational burden of diffusion models, existing methods typically overlook the heterogeneous significance of individual blocks, resulting in suboptimal reuse and degraded output quality. To this end, we address this gap by introducing ProfilingDiT, a novel adaptive caching strategy that explicitly disentangles foreground and background-focused blocks. Through a systematic analysis of attention distributions in diffusion models, we reveal a key observation: 1) Most layers exhibit a consistent preference for either foreground or background regions. 2) Predicted noise shows low inter-step similarity initially, which stabilizes as denoising progresses. This finding inspires us to formulate a selective caching strategy that preserves full computation for dynamic foreground elements while efficiently caching static background features. Our approach substantially reduces computational overhead while preserving visual fidelity. Extensive experiments demonstrate that our framework achieves significant acceleration (e.g., 2.01 times speedup for Wan2.1) while maintaining visual fidelity across comprehensive quality metrics, establishing a viable method for efficient video generation.

PDF 获取#

查看中文 PDF - 2504.03140v2

智能达人抖店二维码

抖音扫码查看更多精彩内容

加载中...
此文章数据所有权由区块链加密技术和智能合约保障仅归创作者所有。