The eltwise primitive applies an operation to every element of the tensor (the variable names follow the standard Naming Conventions):
\[ \dst(\overline{s}) = Operation(\src(\overline{s})), \]
where \(\overline{s} = (s_n, .., s_0)\).
The following operations are supported:
Operation | oneDNN algorithm kind | Forward formula | Backward formula (from src) | Backward formula (from dst) |
---|---|---|---|---|
abs | dnnl_eltwise_abs | \( d = \begin{cases} s & \text{if}\ s > 0 \\ -s & \text{if}\ s \leq 0 \end{cases} \) | \( ds = \begin{cases} dd & \text{if}\ s > 0 \\ -dd & \text{if}\ s < 0 \\ 0 & \text{if}\ s = 0 \end{cases} \) | – |
bounded_relu | dnnl_eltwise_bounded_relu | \( d = \begin{cases} \alpha & \text{if}\ s > \alpha \geq 0 \\ s & \text{if}\ 0 < s \leq \alpha \\ 0 & \text{if}\ s \leq 0 \end{cases} \) | \( ds = \begin{cases} dd & \text{if}\ 0 < s \leq \alpha, \\ 0 & \text{otherwise}\ \end{cases} \) | – |
clip | dnnl_eltwise_clip | \( d = \begin{cases} \beta & \text{if}\ s > \beta \geq \alpha \\ s & \text{if}\ \alpha < s \leq \beta \\ \alpha & \text{if}\ s \leq \alpha \end{cases} \) | \( ds = \begin{cases} dd & \text{if}\ \alpha < s \leq \beta \\ 0 & \text{otherwise}\ \end{cases} \) | – |
elu | dnnl_eltwise_elu dnnl_eltwise_elu_use_dst_for_bwd | \( d = \begin{cases} s & \text{if}\ s > 0 \\ \alpha (e^s - 1) & \text{if}\ s \leq 0 \end{cases} \) | \( ds = \begin{cases} dd & \text{if}\ s > 0 \\ dd \cdot \alpha e^s & \text{if}\ s \leq 0 \end{cases} \) | \( ds = \begin{cases} dd & \text{if}\ d > 0 \\ dd \cdot (d + \alpha) & \text{if}\ d \leq 0 \end{cases}. See\ (2). \) |
exp | dnnl_eltwise_exp dnnl_eltwise_exp_use_dst_for_bwd | \( d = e^s \) | \( ds = dd \cdot e^s \) | \( ds = dd \cdot d \) |
gelu_erf | dnnl_eltwise_gelu_erf | \( d = 0.5 s (1 + \mathop{erf}[\frac{s}{\sqrt{2}}])\) | \( ds = dd \cdot \left(0.5 + 0.5 \, \mathop{erf}\left({\frac{s}{\sqrt{2}}}\right) + \frac{s}{\sqrt{2\pi}}e^{-0.5s^{2}}\right) \) | – |
gelu_tanh | dnnl_eltwise_gelu_tanh | \( d = 0.5 s (1 + \tanh[\sqrt{\frac{2}{\pi}} (s + 0.044715 s^3)])\) | \( See\ (1). \) | – |
linear | dnnl_eltwise_linear | \( d = \alpha s + \beta \) | \( ds = \alpha \cdot dd \) | – |
log | dnnl_eltwise_log | \( d = \log_{e}{s} \) | \( ds = \frac{dd}{s} \) | – |
logistic | dnnl_eltwise_logistic dnnl_eltwise_logistic_use_dst_for_bwd | \( d = \frac{1}{1+e^{-s}} \) | \( ds = \frac{dd}{1+e^{-s}} \cdot (1 - \frac{1}{1+e^{-s}}) \) | \( ds = dd \cdot d \cdot (1 - d) \) |
pow | dnnl_eltwise_pow | \( d = \alpha s^{\beta} \) | \( ds = dd \cdot \alpha \beta s^{\beta - 1} \) | – |
relu | dnnl_eltwise_relu dnnl_eltwise_relu_use_dst_for_bwd | \( d = \begin{cases} s & \text{if}\ s > 0 \\ \alpha s & \text{if}\ s \leq 0 \end{cases} \) | \( ds = \begin{cases} dd & \text{if}\ s > 0 \\ \alpha \cdot dd & \text{if}\ s \leq 0 \end{cases} \) | \( ds = \begin{cases} dd & \text{if}\ d > 0 \\ \alpha \cdot dd & \text{if}\ d \leq 0 \end{cases}. See\ (2). \) |
round | dnnl_eltwise_round | \( d = round(s) \) | – | – |
soft_relu | dnnl_eltwise_soft_relu | \( d = \log_{e}(1+e^s) \) | \( ds = \frac{dd}{1 + e^{-s}} \) | – |
sqrt | dnnl_eltwise_sqrt dnnl_eltwise_sqrt_use_dst_for_bwd | \( d = \sqrt{s} \) | \( ds = \frac{dd}{2\sqrt{s}} \) | \( ds = \frac{dd}{2d} \) |
square | dnnl_eltwise_square | \( d = s^2 \) | \( ds = dd \cdot 2 s \) | – |
swish | dnnl_eltwise_swish | \( d = \frac{s}{1+e^{-\alpha s}} \) | \( ds = \frac{dd}{1 + e^{-\alpha s}}(1 + \alpha s (1 - \frac{1}{1 + e^{-\alpha s}})) \) | – |
tanh | dnnl_eltwise_tanh dnnl_eltwise_tanh_use_dst_for_bwd | \( d = \tanh{s} \) | \( ds = dd \cdot (1 - \tanh^2{s}) \) | \( ds = dd \cdot (1 - d^2) \) |
\( (1)\ ds = dd \cdot 0.5 (1 + tanh[\sqrt{\frac{2}{\pi}} (s + 0.044715 s^3)]) \cdot (1 + \sqrt{\frac{2}{\pi}} (s + 0.134145 s^3) \cdot (1 - tanh[\sqrt{\frac{2}{\pi}} (s + 0.044715 s^3)]) ) \)
\( (2)\ \text{Operation is supported only for } \alpha \geq 0. \)
There is no difference between the dnnl_forward_training and dnnl_forward_inference propagation kinds.
The backward propagation computes \(\diffsrc(\overline{s})\), based on \(\diffdst(\overline{s})\) and \(\src(\overline{s})\). However, some operations support a computation using \(\dst(\overline{s})\) memory produced during forward propagation. Refer to the table above for a list of operations supporting destination as input memory and the corresponding formulas.
The eltwise primitive with algorithm round does not support backward propagation.
When executed, the inputs and outputs should be mapped to an execution argument index as specified by the following table.
Primitive input/output | Execution argument index |
---|---|
\(\src\) | DNNL_ARG_SRC |
\(\dst\) | DNNL_ARG_DST |
\(\diffsrc\) | DNNL_ARG_DIFF_SRC |
\(\diffdst\) | DNNL_ARG_DIFF_DST |
\(binary post-op\) | DNNL_ARG_ATTR_MULTIPLE_POST_OP(binary_post_op_position) | DNNL_ARG_SRC_1 |
data
(e.g., see data_desc
in dnnl::eltwise_forward::desc::desc()). The same holds for \(\diffsrc\) and \(\diffdst\). The corresponding memory descriptors are referred to as diff_data_desc
.The eltwise primitive supports the following combinations of data types:
Propagation | Source / Destination | Int |
---|---|---|
forward / backward | f32, bf16 | f32 |
forward | f16 | f16 |
forward | s32 / s8 / u8 | f32 |
Here the intermediate data type means that the values coming in are first converted to the intermediate data type, then the operation is applied, and finally the result is converted to the output data type.
The eltwise primitive works with arbitrary data tensors. There is no special meaning associated with any logical dimensions.
Propagation | Type | Operation | Description | Restrictions |
---|---|---|---|---|
Forward | Post-op | Binary | Applies a Binary operation to the result | General binary post-op restrictions |
Engine | Name | Com |
---|---|---|
CPU/GPU | Element-Wise Primitive Example | This C++ API example demonstrates how to create and execute an Element-wise primitive in forward training propagation mode. |