K Means based Quantization
Perform clustering on weights, and replace weights with cluster int
index matrix (to which cluster each weight entry belongs to) and a list of float
centroids.
Storing integers consumes less memory while you can keep fully precision on the float centroids (although you lose precision because it does not necessarily correspond to an actual value in the previous weight matrix).
Resources:
- https://scikit-learn.org/stable/auto_examples/cluster/plot_color_quantization.html
- TinyML and Efficient Deep Learning Computing - Lecture 5