Eltwise

API Reference

General

Forward

The eltwise primitive applies an operation to every element of the tensor (the variable names follow the standard Naming Conventions):

\[\dst_{i_1, \ldots, i_k} = Operation\left(\src_{i_1, \ldots, i_k}\right).\]

For notational convenience, in the formulas below we will denote individual element of \(\src\), \(\dst\), \(\diffsrc\), and \(\diffdst\) tensors via s, d, ds, and dd respectively.

The following operations are supported:

Operation

oneDNN algorithm kind

Forward formula

Backward formula (from src)

Backward formula (from dst)

abs

dnnl_eltwise_abs

\(d = \begin{cases} s & \text{if}\ s > 0 \\ -s & \text{if}\ s \leq 0 \end{cases}\)

\(ds = \begin{cases} dd & \text{if}\ s > 0 \\ -dd & \text{if}\ s < 0 \\ 0 & \text{if}\ s = 0 \end{cases}\)

clip

dnnl_eltwise_clip

\(d = \begin{cases} \beta & \text{if}\ s > \beta \geq \alpha \\ s & \text{if}\ \alpha < s \leq \beta \\ \alpha & \text{if}\ s \leq \alpha \end{cases}\)

\(ds = \begin{cases} dd & \text{if}\ \alpha < s \leq \beta \\ 0 & \text{otherwise}\ \end{cases}\)

clip_v2

dnnl_eltwise_clip_v2 dnnl_eltwise_clip_v2_use_dst_for_bwd

\(d = \begin{cases} \beta & \text{if}\ s \geq \beta \geq \alpha \\ s & \text{if}\ \alpha < s < \beta \\ \alpha & \text{if}\ s \leq \alpha \end{cases}\)

\(ds = \begin{cases} dd & \text{if}\ \alpha < s < \beta \\ 0 & \text{otherwise}\ \end{cases}\)

\(ds = \begin{cases} dd & \text{if}\ \alpha < d < \beta \\ 0 & \text{otherwise}\ \end{cases}\)

elu

dnnl_eltwise_elu dnnl_eltwise_elu_use_dst_for_bwd

\(d = \begin{cases} s & \text{if}\ s > 0 \\ \alpha (e^s - 1) & \text{if}\ s \leq 0 \end{cases}\)

\(ds = \begin{cases} dd & \text{if}\ s > 0 \\ dd \cdot \alpha e^s & \text{if}\ s \leq 0 \end{cases}\)

\(ds = \begin{cases} dd & \text{if}\ d > 0 \\ dd \cdot (d + \alpha) & \text{if}\ d \leq 0 \end{cases}. See\ (2).\)

exp

dnnl_eltwise_exp dnnl_eltwise_exp_use_dst_for_bwd

\(d = e^s\)

\(ds = dd \cdot e^s\)

\(ds = dd \cdot d\)

gelu_erf

dnnl_eltwise_gelu_erf

\(d = 0.5 s (1 + \operatorname{erf}[\frac{s}{\sqrt{2}}])\)

\(ds = dd \cdot \left(0.5 + 0.5 \, \operatorname{erf}\left({\frac{s}{\sqrt{2}}}\right) + \frac{s}{\sqrt{2\pi}}e^{-0.5s^{2}}\right)\)

gelu_tanh

dnnl_eltwise_gelu_tanh

\(d = 0.5 s (1 + \tanh[\sqrt{\frac{2}{\pi}} (s + 0.044715 s^3)])\)

\(See\ (1).\)

hardsigmoid

dnnl_eltwise_hardsigmoid

\(d = \text{max}(0, \text{min}(1, \alpha s + \beta))\)

\(ds = \begin{cases} dd \cdot \alpha & \text{if}\ 0 < \alpha s + \beta < 1 \\ 0 & \text{otherwise}\ \end{cases}\)

hardswish

dnnl_eltwise_hardswish

\(d = s \cdot \text{max}(0, \text{min}(1, \alpha s + \beta))\)

\(ds = \begin{cases} dd & \text{if}\ \alpha s + \beta > 1 \\ dd \cdot (2 \alpha s + \beta) & \text{if}\ 0 < \alpha s + \beta < 1 \\ 0 & \text{otherwise} \end{cases}\)

linear

dnnl_eltwise_linear

\(d = \alpha s + \beta\)

\(ds = \alpha \cdot dd\)

log

dnnl_eltwise_log

\(d = \log_{e}{s}\)

\(ds = \frac{dd}{s}\)

logistic

dnnl_eltwise_logistic dnnl_eltwise_logistic_use_dst_for_bwd

\(d = \frac{1}{1+e^{-s}}\)

\(ds = \frac{dd}{1+e^{-s}} \cdot (1 - \frac{1}{1+e^{-s}})\)

\(ds = dd \cdot d \cdot (1 - d)\)

mish

dnnl_eltwise_mish

\(d = s \cdot \tanh{(\log_{e}(1+e^s))}\)

\(ds = dd \cdot \frac{e^{s} \cdot \omega}{\delta^{2}}. See\ (3).\)

pow

dnnl_eltwise_pow

\(d = \alpha s^{\beta}\)

\(ds = dd \cdot \alpha \beta s^{\beta - 1}\)

relu

dnnl_eltwise_relu dnnl_eltwise_relu_use_dst_for_bwd

\(d = \begin{cases} s & \text{if}\ s > 0 \\ \alpha s & \text{if}\ s \leq 0 \end{cases}\)

\(ds = \begin{cases} dd & \text{if}\ s > 0 \\ \alpha \cdot dd & \text{if}\ s \leq 0 \end{cases}\)

\(ds = \begin{cases} dd & \text{if}\ d > 0 \\ \alpha \cdot dd & \text{if}\ d \leq 0 \end{cases}. See\ (2).\)

round

dnnl_eltwise_round

\(d = round(s)\)

soft_relu

dnnl_eltwise_soft_relu

\(d =\frac{1}{\alpha} \log_{e}(1+e^{\alpha s})\)

\(ds = \frac{dd}{1 + e^{-\alpha s}}\)

sqrt

dnnl_eltwise_sqrt dnnl_eltwise_sqrt_use_dst_for_bwd

\(d = \sqrt{s}\)

\(ds = \frac{dd}{2\sqrt{s}}\)

\(ds = \frac{dd}{2d}\)

square

dnnl_eltwise_square

\(d = s^2\)

\(ds = dd \cdot 2 s\)

swish

dnnl_eltwise_swish

\(d = \frac{s}{1+e^{-\alpha s}}\)

\(ds = \frac{dd}{1 + e^{-\alpha s}}(1 + \alpha s (1 - \frac{1}{1 + e^{-\alpha s}}))\)

tanh

dnnl_eltwise_tanh dnnl_eltwise_tanh_use_dst_for_bwd

\(d = \tanh{s}\)

\(ds = dd \cdot (1 - \tanh^2{s})\)

\(ds = dd \cdot (1 - d^2)\)

\((1)\ ds = dd \cdot 0.5 (1 + \tanh[\sqrt{\frac{2}{\pi}} (s + 0.044715 s^3)]) \cdot (1 + \sqrt{\frac{2}{\pi}} (s + 0.134145 s^3) \cdot (1 - \tanh[\sqrt{\frac{2}{\pi}} (s + 0.044715 s^3)]) )\)

\((2)\ \text{Operation is supported only for } \alpha \geq 0.\)

\((3)\ \text{where, } \omega = e^{3s} + 4 \cdot e^{2s} + e^{s} \cdot (4 \cdot s + 6) + 4 \cdot (s + 1) \text{ and } \delta = e^{2s} + 2 \cdot e^{s} + 2.\)

Note that following equations hold:

  • \(bounded\_relu(s, alpha) = clip(s, 0, alpha)\)

  • \(logsigmoid(s) = soft\_relu(s, -1)\)

  • \(hardswish(s, alpha, beta) = s \cdot hardsigmoid(s, alpha, beta)\)

Difference Between Forward Training and Forward Inference

There is no difference between the dnnl_forward_training and dnnl_forward_inference propagation kinds.

Backward

The backward propagation computes \(\diffsrc\) based on \(\diffdst\) and \(\src\) tensors. However, some operations support a computation using \(\dst\) memory produced during the forward propagation. Refer to the table above for a list of operations supporting destination as input memory and the corresponding formulas.

Exceptions

The eltwise primitive with algorithm round does not support backward propagation.

Execution Arguments

When executed, the inputs and outputs should be mapped to an execution argument index as specified by the following table.

Primitive input/output

Execution argument index

\(\src\)

DNNL_ARG_SRC

\(\dst\)

DNNL_ARG_DST

\(\diffsrc\)

DNNL_ARG_DIFF_SRC

\(\diffdst\)

DNNL_ARG_DIFF_DST

\(\text{binary post-op}\)

DNNL_ARG_ATTR_MULTIPLE_POST_OP(binary_post_op_position) | DNNL_ARG_SRC_1

Implementation Details

General Notes

  1. All eltwise primitives have 3 primitive descriptor creation functions (e.g., dnnl::eltwise_forward::primitive_desc()) which may take both \(\alpha\) and \(\beta\), just \(\alpha\), or none of them.

  2. Both forward and backward propagation support in-place operations, meaning that \(\src\) can be used as input and output for forward propagation, and \(\diffdst\) can be used as input and output for backward propagation. In case of an in-place operation, the original data will be overwritten. Note, however, that some algorithms for backward propagation require original \(\src\), hence the corresponding forward propagation should not be performed in-place for those algorithms. Algorithms that use \(\dst\) for backward propagation can be safely done in-place.

  3. For some operations it might be beneficial to compute backward propagation based on \(\dst(\overline{s})\), rather than on \(\src(\overline{s})\), for improved performance.

  4. For logsigmoid original formula \(d = \log_{e}(\frac{1}{1+e^{-s}})\) was replaced by \(d = -soft\_relu(-s)\) for numerical stability.

Note

For operations supporting destination memory as input, \(\dst\) can be used instead of \(\src\) when backward propagation is computed. This enables several performance optimizations (see the tips below).

Data Type Support

The eltwise primitive supports the following combinations of data types:

Propagation

Source / Destination

Intermediate data type

forward / backward

f32, f64, bf16, f16

f32

forward

s32, f64 / s8 / u8

f32

forward / backward

f64

f64

Warning

There might be hardware and/or implementation specific restrictions. Check Implementation Limitations section below.

Here the intermediate data type means that the values coming in are first converted to the intermediate data type, then the operation is applied, and finally the result is converted to the output data type.

Data Representation

The eltwise primitive works with arbitrary data tensors. There is no special meaning associated with any logical dimensions.

Post-Ops and Attributes

Propagation

Type

Operation

Description

Restrictions

Forward

Post-op

Binary

Applies a Binary operation to the result

General binary post-op restrictions

Implementation Limitations

  1. Refer to Data Types for limitations related to data types support.

  2. GPU

    • Only tensors of 6 or fewer dimensions are supported.

Performance Tips

  1. For backward propagation, use the same memory format for \(\src\), \(\diffdst\), and \(\diffsrc\) (the format of the \(\diffdst\) and \(\diffsrc\) are always the same because of the API). Different formats are functionally supported but lead to highly suboptimal performance.

  2. Use in-place operations whenever possible (see caveats in General Notes).

  3. As mentioned above for all operations supporting destination memory as input, one can use the \(\dst\) tensor instead of \(\src\). This enables the following potential optimizations for training:

    • Such operations can be safely done in-place.

    • Moreover, such operations can be fused as a post-op with the previous operation if that operation does not require its \(\dst\) to compute the backward propagation (e.g., if the convolution operation satisfies these conditions).

Example

Eltwise Primitive Example

This C++ API example demonstrates how to create and execute an Element-wise primitive in forward training propagation mode.