Dequantize

General

Dequantize operation converts a quantized (u8/s8/f8_e4m3/f8_e5m2) tensor to an f32 tensor. It supports both per-tensor and per-channel asymmetric linear de-quantization. Rounding mode is library-implementation defined. Zero points (zps in the attribute table) are not supported for f8_e4m3 and f8_e5m2 dequantization.

For per-tensor de-quantization:

\[\dst_{i} = round((\src_{i} - zps) \times scale)\]

For per-channel de-quantization, taking channel axis = 1 as an example:

\[dst_{\cdots,i,\cdots,\cdots} = (\src_{\cdots,i,\cdots,\cdots} - zps_i) \times scale_i, i \in {[0, ic-1]}\]

where \(ic\) is the number of channels.

Operation attributes

Attribute Name

Description

Value Type

Supported Values

Required or Optional

qtype

Specifies which de-quantization type is used.

string

per_tensor (default), per_channel

Optional

axis

Specifies dimension on which per-channel de-quantization is applied.

s64

A s64 value in the range of [-r, r-1] where r = rank(src), 1 by default

Optional

scales

Scalings applied on the src data.

f32

A f32 list (only contain one element if qtype is per_tensor )

Required

zps

Offset values that maps to float zero.

s64

A s64 list (only contain one element if qtype is per_tensor )

Optional

Execution arguments

The inputs and outputs must be provided according to below index order when constructing an operation.

Inputs

Index

Argument Name

Required or Optional

0

src

Required

Outputs

Index

Argument Name

Required or Optional

0

dst

Required

Supported data types

Dequantize operation supports the following data type combinations.

Src

Dst

s8, u8, f8_e4m3, f8_e5m2

f32

Note

This operation is to support int8 quantization model.