Neural Network Quantization
Related:
- HuggingFace Docs
- A survey of quantization methods for efficient neural network inference
- A recent (2024) work by Han et al: AWQ - Activation-aware Weight Quantization for LLM Compression and Acceleration