中文标题#
压力下的像素:探索高分辨率医学影像中基础模型的微调范式
英文标题#
Pixels Under Pressure: Exploring Fine-Tuning Paradigms for Foundation Models in High-Resolution Medical Imaging
中文摘要#
基于扩散的基础模型的进展提高了文本到图像的生成,但大多数努力仅限于低分辨率设置。 随着高分辨率图像合成在各种应用中变得越来越重要,特别是在医学成像领域,微调成为适应这些强大的预训练模型以满足特定任务需求和数据分布的关键机制。 在本工作中,我们进行了一项系统研究,考察在扩展到高分辨率 512x512 像素时,各种微调技术对图像生成质量的影响。 我们基准测试了一组多样化的微调方法,包括完整的微调策略和参数高效的微调(PEFT)。 我们分析了不同的微调方法如何影响关键质量指标,包括 Fréchet Inception Distance(FID)、Vendi 分数和提示图像对齐。 我们还评估了在数据稀缺条件下生成图像在下游分类任务中的实用性,结果表明,当使用合成图像进行分类器训练和在真实图像上进行评估时,特定的微调策略可以提高生成保真度和下游性能。 我们的代码可通过项目网站获取 - https://tehraninasab.github.io/PixelUPressure/.
英文摘要#
Advancements in diffusion-based foundation models have improved text-to-image generation, yet most efforts have been limited to low-resolution settings. As high-resolution image synthesis becomes increasingly essential for various applications, particularly in medical imaging domains, fine-tuning emerges as a crucial mechanism for adapting these powerful pre-trained models to task-specific requirements and data distributions. In this work, we present a systematic study, examining the impact of various fine-tuning techniques on image generation quality when scaling to high resolution 512x512 pixels. We benchmark a diverse set of fine-tuning methods, including full fine-tuning strategies and parameter-efficient fine-tuning (PEFT). We dissect how different fine-tuning methods influence key quality metrics, including Fr'echet Inception Distance (FID), Vendi score, and prompt-image alignment. We also evaluate the utility of generated images in a downstream classification task under data-scarce conditions, demonstrating that specific fine-tuning strategies improve both generation fidelity and downstream performance when synthetic images are used for classifier training and evaluation on real images. Our code is accessible through the project website - https://tehraninasab.github.io/PixelUPressure/.
文章页面#
PDF 获取#
抖音扫码查看更多精彩内容