DynamicDequantize¶
General¶
The Dynamic Dequantize operation converts a quantized (s4, u4, s8, or u8) tensor to an bf16, f16 or f32 tensor. It supports per-tensor, per-channel, and per-group asymmetric linear de-quantization. The rounding mode is defined by the library implementation. Unlike the Dequantize, Dynamic Dequantize takes scales and zero-points as operator src tensors.
For per-tensor de-quantization
For per-channel de-quantization, taking channel axis = 1 as an example:
For per-group de-quantization, let’s take group shape = Gx1 as an example. It indicates that one scaling factor will de adopted for G elements in the src tensor. On the dimensions where group quantization is adopted, make channelNum equal to the dimension of src and groupNum equal to channelNum/group size:
Where:
On other dimensions:
Operation attributes¶
Attribute Name |
Description |
Value Type |
Supported Values |
Required or Optional |
---|---|---|---|---|
Specifies which de-quantization type is used. |
string |
|
Optional |
|
Specifies dimension on which per-channel de-quantization is applied. |
s64 |
An s64 value in the range of [-r, r-1] where r = rank(src), |
Optional |
|
Specifies the group shape of an operation. |
s64 |
An s64 list indicates the group size on the dimensions where grouped quantization is adopted. |
Optional |
Execution arguments¶
The inputs and outputs must be provided according to below index order when constructing an operation.
Inputs¶
Index |
Argument Name |
Required or Optional |
---|---|---|
0 |
|
Required |
1 |
|
Required |
2 |
|
Optional |
Note
scales
is a bf16/f16/f32 tensor to be applied to the de-quantization formula. For qtype
= per-tensor
, there should be only one element in the scales
tensor. For qtype
= per-channel
, the element number should be equal to the element number of the src tensor along the dimension axis. For qtype
= per-gropup
, the scale
tensor should have the same number of dimension as the src
tensor. On the dimensions where grouped quantization is applied, the dimension should be the number of groups, which equals to src_dim
/ group_size
, while other dimensions should match the src
tensor.
Note
zps
is a tensor with offset values that map to zero. For qtype
= per-tensor
, there should be only one element in the zps
tensor. For qtype
= per-channel
, the element number should be equal to the element number of input tensor along the dimension axis. For qtype
= per-group
, the zps
tensor should have the same number of dimensions as the src
tensor. On the dimensions where grouped quantization is applied, the dimension should be the number of groups, which equals to src_dim
/ group_size
, while other dimensions should match the src
tensor. If omitted, the zps
values are assumed to be zero.
Outputs¶
Index |
Argument Name |
Required or Optional |
---|---|---|
0 |
|
Required |
Supported data types¶
DynamicDequantize operation supports the following data type combinations.
Src |
Dst |
Scales |
Zps |
---|---|---|---|
s8 |
f16, bf16, f32 |
f16, bf16, f32 |
s8, u8, s32 |
u8 |
f16, bf16, f32 |
f16, bf16, f32 |
s8, u8, s32 |
s4 |
f16, bf16, f32 |
f16, bf16, f32 |
s4, u4, s32 |
u4 |
f16, bf16, f32 |
f16, bf16, f32 |
s4, u4, s32 |
It’s expected that the data types of scales and dst should be the same.