Primitive attributes. More...

#include <dnnl.hpp>

Inheritance diagram for dnnl::primitive_attr:

Collaboration diagram for dnnl::primitive_attr:

Public Member Functions
	primitive_attr ()
	Constructs default (empty) primitive attributes.

	primitive_attr (dnnl_primitive_attr_t attr)
	Creates primitive attributes from a C API dnnl_primitive_attr_t handle. More...

scratchpad_mode	get_scratchpad_mode () const
	Returns the scratchpad mode.

void	set_scratchpad_mode (scratchpad_mode mode)
	Sets scratchpad mode. More...

void	get_output_scales (int &mask, std::vector< float > &scales) const
	Returns output scaling factors correspondence mask and values. More...

void	set_output_scales (int mask, const std::vector< float > &scales)
	Sets output scaling factors correspondence mask and values. More...

void	get_scales (int arg, int &mask, std::vector< float > &scales) const
	Returns scaling factors correspondence mask and values for a given memory argument. More...

void	set_scales (int arg, int mask, const std::vector< float > &scales)
	Sets scaling factors for primitive operations for a given memory argument. More...

void	get_zero_points (int arg, int &mask, std::vector< int32_t > &zero_points) const
	Returns zero points correspondence mask and values. More...

void	set_zero_points (int arg, int mask, const std::vector< int32_t > &zero_points)
	Sets zero points for primitive operations for a given memory argument. More...

const post_ops	get_post_ops () const
	Returns post-ops previously set via set_post_ops(). More...

void	set_post_ops (const post_ops ops)
	Sets post-ops. More...

void	set_rnn_data_qparams (float scale, float shift)
	Sets quantization scale and shift parameters for RNN data tensors. More...

void	set_rnn_weights_qparams (int mask, const std::vector< float > &scales)
	Sets quantization scaling factors for RNN weights tensors. More...

Public Member Functions inherited from dnnl::handle< dnnl_primitive_attr_t >
bool	operator== (const handle< dnnl_primitive_attr_t, handle_traits< dnnl_primitive_attr_t > > &other) const
	Equality operator. More...

bool	operator!= (const handle &other) const
	Inequality operator. More...

	handle ()=default
	Constructs an empty handle object. More...

	handle (const handle< dnnl_primitive_attr_t, handle_traits< dnnl_primitive_attr_t > > &)=default
	Copy constructor.

	handle (handle< dnnl_primitive_attr_t, handle_traits< dnnl_primitive_attr_t > > &&)=default
	Move constructor.

	handle (dnnl_primitive_attr_t t, bool weak=false)
	Constructs a handle wrapper object from a C API handle. More...

handle< dnnl_primitive_attr_t, handle_traits< dnnl_primitive_attr_t > > &	operator= (const handle< dnnl_primitive_attr_t, handle_traits< dnnl_primitive_attr_t > > &)=default
	Assignment operator.

handle< dnnl_primitive_attr_t, handle_traits< dnnl_primitive_attr_t > > &	operator= (handle< dnnl_primitive_attr_t, handle_traits< dnnl_primitive_attr_t > > &&)=default
	Move assignment operator.

void	reset (dnnl_primitive_attr_t t, bool weak=false)
	Resets the handle wrapper objects to wrap a new C API handle. More...

dnnl_primitive_attr_t	get (bool allow_empty=false) const
	Returns the underlying C API handle. More...

	operator dnnl_primitive_attr_t () const
	Converts a handle to the underlying C API handle type. More...

	operator bool () const
	Checks whether the object is empty. More...

Detailed Description

Primitive attributes.

See also: Primitive Attributes

Examples:: cnn_inference_int8.cpp, cpu_matmul_quantization.cpp, cpu_rnn_inference_int8.cpp, cpu_sgemm_and_matmul.cpp, inference_int8_matmul.cpp, and performance_profiling.cpp.

Constructor & Destructor Documentation

◆ primitive_attr()

dnnl::primitive_attr::primitive_attr ( dnnl_primitive_attr_t attr )

inline

Creates primitive attributes from a C API dnnl_primitive_attr_t handle.

The resulting handle is not weak and the C handle will be destroyed during the destruction of the C++ object.

Parameters

attr	The C API primitive attributes.

Member Function Documentation

◆ set_scratchpad_mode()

void dnnl::primitive_attr::set_scratchpad_mode ( scratchpad_mode mode )

inline

Sets scratchpad mode.

Parameters

mode	Specified scratchpad mode.

◆ get_output_scales()

void dnnl::primitive_attr::get_output_scales	(	int &	mask,
		std::vector< float > &	scales
	)		const

inline

Returns output scaling factors correspondence mask and values.

Parameters

mask	Scaling factors correspondence mask that defines the correspondence between the output tensor dimensions and the `scales` vector. The set i-th bit indicates that a dedicated output scaling factor is used for each index along that dimension. The mask value of 0 implies a common output scaling factor for the whole output tensor.
scales	Vector of output scaling factors.

◆ set_output_scales()

void dnnl::primitive_attr::set_output_scales	(	int	mask,
		const std::vector< float > &	scales
	)

inline

Sets output scaling factors correspondence mask and values.

Note

The order of dimensions does not depend on how elements are laid out in memory. For example:

for a 2D CNN activations tensor the order is always (n, c)
for a 4D CNN activations tensor the order is always (n, c, h, w)
for a 5D CNN weights tensor the order is always (g, oc, ic, kh, kw)

Example usage:

int mb = 32, oc = 32,
    oh = 14, ow = 14; // convolution output params
// unique output scales per output channel
vector<float> scales = { ... };
int oc_dim = 1; // mb_dim = 0, channel_dim = 1, height_dim = 2, ...
// construct a convolution descriptor
dnnl::convolution::desc conv_d;
dnnl::primitive_attr attr;
attr.set_output_scales(attr, oc, 1 << oc_dim, scales);
dnnl::primitive_desc conv_pd(conv_d, attr, engine);

Parameters

mask Defines the correspondence between the output tensor dimensions and the scales vector. The set i-th bit indicates that a dedicated scaling factor is used for each index along that dimension. Set the mask to 0 to use a common output scaling factor for the whole output tensor.

scales

Constant vector of output scaling factors. If the scaling factors are known at the time of this call, the following equality must hold:

\[scales.size() = \prod\limits_{d \in mask} output.dims[d].\]

Violations can only be detected when the attributes are used to create a primitive descriptor. If the scaling factors are not known at the time of the call, this vector must contain a single DNNL_RUNTIME_F32_VAL value and the output scaling factors must be passed at execution time as an argument with index DNNL_ARG_ATTR_OUTPUT_SCALES.

Examples:: cnn_inference_int8.cpp, cpu_matmul_quantization.cpp, cpu_sgemm_and_matmul.cpp, and inference_int8_matmul.cpp.

◆ get_scales()

void dnnl::primitive_attr::get_scales	(	int	arg,
		int &	mask,
		std::vector< float > &	scales
	)		const

inline

Returns scaling factors correspondence mask and values for a given memory argument.

Parameters

arg	Parameter argument index as passed to the primitive::execute() call.
mask	Scaling factors correspondence mask that defines the correspondence between the output tensor dimensions and the `scales` vector. The set i-th bit indicates that a dedicated scaling factor is used for each index along that dimension. Set the mask to 0 to use a common scaling factor for the whole output tensor.
scales	Output vector of scaling factors.

◆ set_scales()

void dnnl::primitive_attr::set_scales	(	int	arg,
		int	mask,
		const std::vector< float > &	scales
	)

inline

Sets scaling factors for primitive operations for a given memory argument.

See also: dnnl_primitive_attr_set_scales; dnnl::primitive_attr::set_output_scales

Parameters

arg	Parameter argument index as passed to the primitive::execute() call.
mask	Scaling factors correspondence mask that defines the correspondence between the tensor dimensions and the `scales` vector. The set i-th bit indicates that a dedicated scaling factor is used for each index along that dimension. Set the mask to 0 to use a common scaling factor for the whole output tensor.
scales	Constant vector of scaling factors. The following equality must hold: \[scales.size() = \prod\limits_{d \in mask} argument.dims[d].\]

◆ get_zero_points()

void dnnl::primitive_attr::get_zero_points	(	int	arg,
		int &	mask,
		std::vector< int32_t > &	zero_points
	)		const

inline

Returns zero points correspondence mask and values.

Parameters

arg	Parameter argument index as passed to the primitive::execute() call.
mask	Zero points correspondence mask that defines the correspondence between the output tensor dimensions and the `zero_points` vector. The set i-th bit indicates that a dedicated zero point is used for each index along that dimension. Set the mask to 0 to use a common zero point for the whole output tensor.
zero_points	Output vector of zero points.

◆ set_zero_points()

void dnnl::primitive_attr::set_zero_points	(	int	arg,
		int	mask,
		const std::vector< int32_t > &	zero_points
	)

inline

Sets zero points for primitive operations for a given memory argument.

See also: dnnl_primitive_attr_set_zero_points; dnnl::primitive_attr::set_output_scales

Parameters

arg	Parameter argument index as passed to the primitive::execute() call.
mask	Zero point correspondence mask that defines the correspondence between the tensor dimensions and the `zero_points` vector. The set i-th bit indicates that a dedicated zero point is used for each index along that dimension. Set the mask to 0 to use a common zero point for the whole output tensor.
zero_points	Constant vector of zero points. If the zero points are known at the time of this call, the following equality must hold: \[zero_points.size() = \prod\limits_{d \in mask} argument.dims[d].\] If the zero points are not known at the time of the call, this vector must contain a single DNNL_RUNTIME_F32_VAL value and the zero points must be passed at execution time as an argument with index DNNL_ARG_ATTR_ZERO_POINTS.

Examples:: cpu_matmul_quantization.cpp.

◆ get_post_ops()

const post_ops dnnl::primitive_attr::get_post_ops ( ) const

inline

Returns post-ops previously set via set_post_ops().

Returns: Post-ops.

◆ set_post_ops()

void dnnl::primitive_attr::set_post_ops ( const post_ops ops )

inline

Sets post-ops.

Note: There is no way to check whether the post-ops would be supported by the target primitive. Any error will be reported by the respective primitive descriptor constructor.

Parameters

ops	Post-ops object to copy post-ops from.

Examples:: cnn_inference_int8.cpp, cpu_sgemm_and_matmul.cpp, and performance_profiling.cpp.

◆ set_rnn_data_qparams()

void dnnl::primitive_attr::set_rnn_data_qparams	(	float	scale,
		float	shift
	)

inline

Sets quantization scale and shift parameters for RNN data tensors.

For performance reasons, the low-precision configuration of the RNN primitives expect input activations to have the unsigned 8-bit integer data type. The scale and shift parameters are used to quantize floating-point data to unsigned integer and must be passed to the RNN primitive using attributes.

The quantization formula is scale * (data + shift).

Note: Quantization scale and shift are common for src_layer, src_iter, dst_iter, and dst_layer.

Example usage:

// RNN parameters
int l = 2, t = 2, mb = 32, sic = 32, slc = 32, dic = 32, dlc = 32;
// Activations quantization parameters
float scale = ..., shift = ..;
primitive_attr attr;
// Set scale and shift for int8 quantization of activation
attr.set_rnn_data_qparams(scale, shift);
// Create and configure rnn op_desc
vanilla_rnn_forward::desc rnn_d(...);
vanilla_rnn_forward::primitive_desc rnn_d(rnn_d, attr, engine);

Parameters

scale	The value to scale the data by.
shift	The value to shift the data by.

Examples:: cpu_rnn_inference_int8.cpp.

◆ set_rnn_weights_qparams()

void dnnl::primitive_attr::set_rnn_weights_qparams	(	int	mask,
		const std::vector< float > &	scales
	)

inline

Sets quantization scaling factors for RNN weights tensors.

The low-precision configuration of the RNN primitives expect input weights to use the signed 8-bit integer data type. The scaling factors are used to quantize floating-point data to signed integer and must be passed to RNN primitives using attributes.

Note: The dimension order is always native and does not depend on the actual layout used. For example, five-dimensional weights always have (l, d, i, g, o) logical dimension ordering.; Quantization scales are common for weights_layer and weights_iteration

Parameters

mask	Scaling factors correspondence mask that defines the correspondence between the output tensor dimensions and the `scales` vector. The set i-th bit indicates that a dedicated scaling factor should be used each index along that dimension. Set the mask to 0 to use a common scaling factor for the whole output tensor.
scales	Constant vector of output scaling factors. The following equality must hold: \[scales.size() = \prod\limits_{d \in mask} weights.dims[d].\] Violations can only be detected when the attributes are used to create a primitive descriptor.

Examples:: cpu_rnn_inference_int8.cpp.

The documentation for this struct was generated from the following file:

include/dnnl.hpp

Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ primitive_attr()

Member Function Documentation

◆ set_scratchpad_mode()

◆ get_output_scales()

◆ set_output_scales()

◆ get_scales()

◆ set_scales()

◆ get_zero_points()

◆ set_zero_points()

◆ get_post_ops()

◆ set_post_ops()

◆ set_rnn_data_qparams()

◆ set_rnn_weights_qparams()