PyTorch Eager Mode Quantization TensorRT Acceleration

Properties
authors	Lei Mao
year	2024
url	https://leimao.github.io/blog/PyTorch-Eager-Mode-Quantization-TensorRT-Acceleration/

Abstract

The TensorRT acceleration for the quantized PyTorch model from the PyTorch eager mode quantization interface involves three steps:

Perform PyTorch eager mode quantization on the floating-point PyTorch model in PyTorch and export the quantized PyTorch model to ONNX.
Fix the quantized ONNX model graph so that it can be parsed by the TensorRT parser.
Build the quantized ONNX model to a TensorRT engine, profile the performance, and verify the accuracy.> 1

The source code for this post can be found on GitHub .