PyTorch Eager Mode Quantization TensorRT Acceleration
Properties | |
---|---|
authors | Lei Mao |
year | 2024 |
url | https://leimao.github.io/blog/PyTorch-Eager-Mode-Quantization-TensorRT-Acceleration/ |
Abstract
The TensorRT acceleration for the quantized PyTorch model from the PyTorch eager mode quantization interface involves three steps:
- Perform PyTorch eager mode quantization on the floating-point PyTorch model in PyTorch and export the quantized PyTorch model to ONNX.
- Fix the quantized ONNX model graph so that it can be parsed by the TensorRT parser.
- Build the quantized ONNX model to a TensorRT engine, profile the performance, and verify the accuracy.> 1
The source code for this post can be found on GitHub .