API Reference

The eltwise primitive applies an operation to every element of the tensor (the variable names follow the standard Naming Conventions):

\[ \dst(\overline{s}) = Operation(\src(\overline{s})), \]

where \(\overline{s} = (s_n, .., s_0)\).

The following operations are supported:

Operation	DNNL algorithm kind	Forward formula	Backward formula (from src)	Bac
abs	dnnl_eltwise_abs	\( d = \begin{cases} s & \text{if}\ s > 0 \\ -s & \text{if}\ s \leq 0 \end{cases} \)	\( ds = \begin{cases} dd & \text{if}\ s > 0 \\ -dd & \text{if}\ s < 0 \\ 0 & \text{if}\ s = 0 \end{cases} \)	–
bounded_relu	dnnl_eltwise_bounded_relu	\( d = \begin{cases} \alpha & \text{if}\ s > \alpha \geq 0 \\ s & \text{if}\ 0 < s \leq \alpha \\ 0 & \text{if}\ s \leq 0 \end{cases} \)	\( ds = \begin{cases} dd & \text{if}\ 0 < s \leq \alpha, \\ 0 & \text{otherwise}\ \end{cases} \)	–
clip	dnnl_eltwise_clip	\( d = \begin{cases} \beta & \text{if}\ s > \beta \geq \alpha \\ s & \text{if}\ \alpha < s \leq \beta \\ \alpha & \text{if}\ s \leq \alpha \end{cases} \)	\( ds = \begin{cases} dd & \text{if}\ \alpha < s \leq \beta \\ 0 & \text{otherwise}\ \end{cases} \)	–
elu	dnnl_eltwise_elu dnnl_eltwise_elu_use_dst_for_bwd	\( d = \begin{cases} s & \text{if}\ s > 0 \\ \alpha (e^s - 1) & \text{if}\ s \leq 0 \end{cases} \)	\( ds = \begin{cases} dd & \text{if}\ s > 0 \\ dd \cdot \alpha e^s & \text{if}\ s \leq 0 \end{cases} \)	\( ds = \begin{cases} dd & \text{if}\ d > 0 \\ dd \cdot (d + \alpha) & \text{if}\ d \leq 0 \end{cases}. See\ (2). \)
exp	dnnl_eltwise_exp dnnl_eltwise_exp_use_dst_for_bwd	\( d = e^s \)	\( ds = dd \cdot e^s \)	\( ds = dd \cdot d \)
gelu_erf	dnnl_eltwise_gelu_erf	\( d = 0.5 s (1 + erf[\frac{s}{\sqrt{2}}])\)	\( ds = dd \cdot \left(0.5 + 0.5 \, \textrm{erf}\left({\frac{s}{\sqrt{2}}}\right) + \frac{s}{\sqrt{2\pi}}e^{-0.5s^{2}}\right) \)	–
gelu_tanh	dnnl_eltwise_gelu_tanh	\( d = 0.5 s (1 + tanh[\sqrt{\frac{2}{\pi}} (s + 0.044715 s^3)])\)	\( See\ (1). \)	–
linear	dnnl_eltwise_linear	\( d = \alpha s + \beta \)	\( ds = \alpha \cdot dd \)	–
log	dnnl_eltwise_log	\( d = \log_{e}{s} \)	\( ds = \frac{dd}{s} \)	–
logistic	dnnl_eltwise_logistic dnnl_eltwise_logistic_use_dst_for_bwd	\( d = \frac{1}{1+e^{-s}} \)	\( ds = \frac{dd}{1+e^{-s}} \cdot (1 - \frac{1}{1+e^{-s}}) \)	\( ds = dd \cdot d \cdot (1 - d) \)
pow	dnnl_eltwise_pow	\( d = \alpha s^{\beta} \)	\( ds = dd \cdot \alpha \beta s^{\beta - 1} \)	–
relu	dnnl_eltwise_relu dnnl_eltwise_relu_use_dst_for_bwd	\( d = \begin{cases} s & \text{if}\ s > 0 \\ \alpha s & \text{if}\ s \leq 0 \end{cases} \)	\( ds = \begin{cases} dd & \text{if}\ s > 0 \\ \alpha \cdot dd & \text{if}\ s \leq 0 \end{cases} \)	\( ds = \begin{cases} dd & \text{if}\ d > 0 \\ \alpha \cdot dd & \text{if}\ d \leq 0 \end{cases}. See\ (2). \)
soft_relu	dnnl_eltwise_soft_relu	\( d = \log_{e}(1+e^s) \)	\( ds = \frac{dd}{1 + e^{-s}} \)	–
sqrt	dnnl_eltwise_sqrt dnnl_eltwise_sqrt_use_dst_for_bwd	\( d = \sqrt{s} \)	\( ds = \frac{dd}{2\sqrt{s}} \)	\( ds = \frac{dd}{2d} \)
square	dnnl_eltwise_square	\( d = s^2 \)	\( ds = dd \cdot 2 s \)	–
swish	dnnl_eltwise_swish	\( d = \frac{s}{1+e^{-\alpha s}} \)	\( ds = \frac{dd}{1 + e^{-\alpha s}}(1 + \alpha s (1 - \frac{1}{1 + e^{-\alpha s}}) \)	–
tanh	dnnl_eltwise_tanh dnnl_eltwise_tanh_use_dst_for_bwd	\( d = \tanh{s} \)	\( ds = dd \cdot (1 - \tanh^2{s}) \)	\( ds = dd \cdot (1 - d^2) \)

\( (1)\ ds = dd \cdot 0.5 (1 + tanh[\sqrt{\frac{2}{\pi}} (s + 0.044715 s^3)]) (1 + \sqrt{\frac{2}{\pi}} (s + 0.134145 s^3) (1 - tanh[s \sqrt{\frac{2}{\pi}} (1 + 0.044715 s^2)]) ) \)

\( (2)\ \text{Operation is supported only for } \alpha \geq 0. \)

Difference Between Forward Training and Forward Inference

There is no difference between the dnnl_forward_training and dnnl_forward_inference propagation kinds.

Backward

The backward propagation computes \(\diffsrc(\overline{s})\), based on \(\diffdst(\overline{s})\) and \(\src(\overline{s})\). However, some operations support a computation using \(\dst(\overline{s})\) memory produced during forward propagation. Refer to the table above for a list of operations supporting destination as input memory and the corresponding formulas.

Execution Arguments

When executed, the inputs and outputs should be mapped to an execution argument index as specified by the following table.

Primitive input/output	Execution argument index
\(\src\)	DNNL_ARG_SRC
\(\dst\)	DNNL_ARG_DST
\(\diffsrc\)	DNNL_ARG_DIFF_SRC
\(\diffdst\)	DNNL_ARG_DIFF_DST

Implementation Details

General Notes

All eltwise primitives have a common initialization function (e.g., dnnl::eltwise_forward::desc::desc()) which takes both parameters \(\alpha\), and \(\beta\). These parameters are ignored if they are unused.
The memory format and data type for \(\src\) and \(\dst\) are assumed to be the same, and in the API are typically referred as data (e.g., see data_desc in dnnl::eltwise_forward::desc::desc()). The same holds for \(\diffsrc\) and \(\diffdst\). The corresponding memory descriptors are referred to as diff_data_desc.
Both forward and backward propagation support in-place operations, meaning that \(\src\) can be used as input and output for forward propagation, and \(\diffdst\) can be used as input and output for backward propagation. In case of in-place operation, the original data will be overwritten.
For some operations it might be beneficial to compute backward propagation based on \(\dst(\overline{s})\), rather than on \(\src(\overline{s})\), for improved performance.

Note: For operations supporting destination memory as input, \(\dst\) can be used instead of \(\src\) when backward propagation is computed. This enables several performance optimizations (see the tips below).

Data Type Support

The eltwise primitive supports the following combinations of data types:

Propagation	Source / Destination	Int
forward / backward	f32, bf16	f32
forward	f16	f16
forward	s32 / s8 / u8	f32

Warning: There might be hardware and/or implementation specific restrictions. Check Implementation Limitations section below.

Here the intermediate data type means that the values coming in are first converted to the intermediate data type, then the operation is applied, and finally the result is converted to the output data type.

Data Representation

The eltwise primitive works with arbitrary data tensors. There is no special meaning associated with any logical dimensions.

Post-ops and Attributes

The eltwise primitive doesn't support any post-ops or attributes.

Implementation Limitations

Refer to Data Types for limitations related to data types support.

Performance Tips

For backward propagation, use the same memory format for \(\src\), \(\diffdst\), and \(\diffsrc\) (the format of the \(\diffdst\) and \(\diffsrc\) are always the same because of the API). Different formats are functionally supported but lead to highly suboptimal performance.
Use in-place operations whenever possible.
As mentioned above for all operations supporting destination memory as input, one can use the \(\dst\) tensor instead of \(\src\). This enables the following potential optimizations for training:
- Such operations can be safely done in-place.
- Moreover, such operations can be fused as a post-op with the previous operation if that operation doesn't require its \(\dst\) to compute the backward propagation (e.g., if the convolution operation satisfies these conditions).