PyTorch Eager Mode Quantization TensorRT Acceleration
| Properties | |
|---|---|
| authors | Lei Mao |
| year | 2024 |
| url | https://leimao.github.io/blog/PyTorch-Eager-Mode-Quantization-TensorRT-Acceleration/ |
Abstract
The TensorRT acceleration for the quantized PyTorch model from the PyTorch eager mode quantization interface involves three steps:
- Perform PyTorch eager mode quantization on the floating-point PyTorch model in PyTorch and export the quantized PyTorch model to ONNX.
- Fix the quantized ONNX model graph so that it can be parsed by the TensorRT parser.
- Build the quantized ONNX model to a TensorRT engine, profile the performance, and verify the accuracy.> 1
The source code for this post can be found on GitHub .