Skip to content

PyTorch Quantization

Properties
authors PyTorch Quantization for TensorRT
year 2024
url https://pytorch.org/docs/main/quantization.html#prototype-pytorch-2-export-quantization

Backend/Hardware Support

Hardware Kernel Library Eager Mode Quantization FX Graph Mode Quantization Quantization Mode Support
server CPU fbgemm/onednn Supported All Supported
mobile CPU qnnpack/xnnpack
server GPU TensorRT (early prototype) Not support this it requires a graph Supported Static Quantization

Today, PyTorch supports the following backends for running quantized operators efficiently:

  • x86 CPUs with AVX2 support or higher (without AVX2 some operations have inefficient implementations), via x86 optimized by fbgemm and onednn (see the details at RFC)
  • ARM CPUs (typically found in mobile/embedded devices), via qnnpack
  • (early prototype) support for NVidia GPU via TensorRT through fx2trt (to be open sourced)

Note:
- This is a bit old, as fx2trt is already available in torch-tensorrt. However, there