TorchAO：PyTorch原生訓練到服務模型優化

2507.16099v1

中文标题#

TorchAO：PyTorch 原生訓練到服務模型優化

英文标题#

TorchAO: PyTorch-Native Training-to-Serving Model Optimization

中文摘要#

我們介紹 TorchAO，這是一個基於 PyTorch 的模型優化框架，利用量化和稀疏性提供 AI 模型的端到端訓練到服務工作流。 TorchAO 支持多種流行的模型優化技術，包括 FP8 量化訓練、量化感知訓練（QAT）、訓練後量化（PTQ）和 2:4 稀疏性，並利用一種新的張量子類別抽象來表示各種廣泛使用的、與後端無關的低精度數據類型，包括 INT4、INT8、 FP8、MXFP4、MXFP6 和 MXFP8。 TorchAO 在模型優化流程的每個步驟中都與更廣泛的生態系統緊密集成，從預訓練（TorchTitan）到微調（TorchTune、Axolotl）再到服務（HuggingFace、vLLM、 SGLang、ExecuTorch），將原本分散的空間連接成一個統一的工作流。 TorchAO 已用於最近發布的量化 Llama 3.2 1B/3B 和 LlamaGuard3-8B 模型，並在https://github.com/pytorch/ao/ 開源。

英文摘要#

We present TorchAO, a PyTorch-native model optimization framework leveraging quantization and sparsity to provide an end-to-end, training-to-serving workflow for AI models. TorchAO supports a variety of popular model optimization techniques, including FP8 quantized training, quantization-aware training (QAT), post-training quantization (PTQ), and 2:4 sparsity, and leverages a novel tensor subclass abstraction to represent a variety of widely-used, backend agnostic low precision data types, including INT4, INT8, FP8, MXFP4, MXFP6, and MXFP8. TorchAO integrates closely with the broader ecosystem at each step of the model optimization pipeline, from pre-training (TorchTitan) to fine-tuning (TorchTune, Axolotl) to serving (HuggingFace, vLLM, SGLang, ExecuTorch), connecting an otherwise fragmented space in a single, unified workflow. TorchAO has enabled recent launches of the quantized Llama 3.2 1B/3B and LlamaGuard3-8B models and is open-source at https://github.com/pytorch/ao/.

PDF 獲取#

查看中文 PDF - 2507.16099v1

智能達人抖店二維碼

抖音掃碼查看更多精彩內容