Linear Quantization

Pasted image 20240802145100.png

Visualization
Pasted image 20240802145203.png

Then, for each layer in your network (linear, conv, etc), you represent the matrices involved like the previous formulation, do some arithmetic to see what you can precompute and zero-out and voilá