中文标题#
TorchAO:PyTorch 原生训练到服务模型优化
英文标题#
TorchAO: PyTorch-Native Training-to-Serving Model Optimization
中文摘要#
我们介绍 TorchAO,这是一个基于 PyTorch 的模型优化框架,利用量化和稀疏性提供 AI 模型的端到端训练到服务工作流。 TorchAO 支持多种流行的模型优化技术,包括 FP8 量化训练、量化感知训练(QAT)、训练后量化(PTQ)和 2:4 稀疏性,并利用一种新的张量子类抽象来表示各种广泛使用的、与后端无关的低精度数据类型,包括 INT4、INT8、 FP8、MXFP4、MXFP6 和 MXFP8。 TorchAO 在模型优化流程的每个步骤中都与更广泛的生态系统紧密集成,从预训练(TorchTitan)到微调(TorchTune、Axolotl)再到服务(HuggingFace、vLLM、 SGLang、ExecuTorch),将原本分散的空间连接成一个统一的工作流。 TorchAO 已用于最近发布的量化 Llama 3.2 1B/3B 和 LlamaGuard3-8B 模型,并在https://github.com/pytorch/ao/ 开源。
英文摘要#
We present TorchAO, a PyTorch-native model optimization framework leveraging quantization and sparsity to provide an end-to-end, training-to-serving workflow for AI models. TorchAO supports a variety of popular model optimization techniques, including FP8 quantized training, quantization-aware training (QAT), post-training quantization (PTQ), and 2:4 sparsity, and leverages a novel tensor subclass abstraction to represent a variety of widely-used, backend agnostic low precision data types, including INT4, INT8, FP8, MXFP4, MXFP6, and MXFP8. TorchAO integrates closely with the broader ecosystem at each step of the model optimization pipeline, from pre-training (TorchTitan) to fine-tuning (TorchTune, Axolotl) to serving (HuggingFace, vLLM, SGLang, ExecuTorch), connecting an otherwise fragmented space in a single, unified workflow. TorchAO has enabled recent launches of the quantized Llama 3.2 1B/3B and LlamaGuard3-8B models and is open-source at https://github.com/pytorch/ao/.
PDF 获取#
抖音扫码查看更多精彩内容