.. index:: pair: page; Quantize .. _doxid-dev_guide_op_quantize: Quantize ======== General ~~~~~~~ Quantize operation converts a f32 tensor to a quantized (u8/s8) tensor. It supports both per-tensor and per-channel asymmetric linear quantization. Output data type is specified in output tensor data type. Rounding mode is library-implementation defined. For per-tensor quantization: .. math:: \dst_{i} = round(\src_{i} / scale + zp) For per-channel quantization, taking channel axis = 1 as an example: .. math:: dst_{\cdots,i,\cdots,\cdots} = round(\src_{\cdots,i,\cdots,\cdots} / scale_i + zp_i), i \in {[0, ic-1]} where :math:`ic` is the number of channels. Operation attributes ~~~~~~~~~~~~~~~~~~~~ =================================================================================================================== ================================================================== ======= ============================================================================ ========= Attribute Name De =================================================================================================================== ================================================================== ======= ============================================================================ ========= :ref:`qtype ` Specifies which quantization type is used. string ``per_tensor`` (default), ``per_channel`` Optional :ref:`axis ` Specifies dimension on which per-channel quantization is applied. s64 A s64 value in the range of [-r, r-1] where r = rank(src), ``1`` by default Optional :ref:`scales ` Scalings applied on the src data. f32 A f32 list (only contain one element if qtype is ``per_tensor`` ) Required :ref:`zps ` Offset values that maps to float zero. s64 A s64 list (only contain one element if qtype is ``per_tensor`` ) Required =================================================================================================================== ================================================================== ======= ============================================================================ ========= Execution arguments ~~~~~~~~~~~~~~~~~~~ The inputs and outputs must be provided according to below index order when constructing an operation. Inputs ------ ====== ======== ========= Index Argu ====== ======== ========= 0 ``src`` Required ====== ======== ========= Outputs ------- ====== ======== ========= Index Argu ====== ======== ========= 0 ``dst`` Required ====== ======== ========= Supported data types ~~~~~~~~~~~~~~~~~~~~ Quantize operation supports the following data type combinations. ==== ======= Src D ==== ======= f32 s8, u8 ==== ======= .. note:: This operation is to support :ref:`int8 quantization ` model.