There exist a plethora of articles on quantization, but they generally cover only the surface level theory or provide a simple overview. In this article, I’ll explain how quantization is actually implemented by deep learning frameworks.
Before getting into quantization, it’s good to understand the basic difference between two key concepts: float (floating-point) and int (fixed-point).