.. index:: pair: page; Eltwise .. _doxid-dev_guide_eltwise: Eltwise ======= :ref:API Reference  General ~~~~~~~ Forward ------- The eltwise primitive applies an operation to every element of the tensor (the variable names follow the standard :ref:Naming Conventions ): .. math:: \dst_{i_1, \ldots, i_k} = Operation\left(\src_{i_1, \ldots, i_k}\right). For notational convenience, in the formulas below we will denote individual element of :math:\src, :math:\dst, :math:\diffsrc, and :math:\diffdst tensors via s, d, ds, and dd respectively. The following operations are supported: ============ ================================================================================================================================================================================================================================================================================================================ ============================================================================================================================================================ =========================================================================================================================================================================== ======================================================================================================================= Operation oneDNN algorithm kind Forward formula Backward formula (from src) Backward formula (from dst) ============ ================================================================================================================================================================================================================================================================================================================ ============================================================================================================================================================ =========================================================================================================================================================================== ======================================================================================================================= abs :ref:dnnl_eltwise_abs  :math:d = \begin{cases} s & \text{if}\ s > 0 \\ -s & \text{if}\ s \leq 0 \end{cases} :math:ds = \begin{cases} dd & \text{if}\ s > 0 \\ -dd & \text{if}\ s < 0 \\ 0 & \text{if}\ s = 0 \end{cases} clip :ref:dnnl_eltwise_clip  :math:d = \begin{cases} \beta & \text{if}\ s > \beta \geq \alpha \\ s & \text{if}\ \alpha < s \leq \beta \\ \alpha & \text{if}\ s \leq \alpha \end{cases} :math:ds = \begin{cases} dd & \text{if}\ \alpha < s \leq \beta \\ 0 & \text{otherwise}\ \end{cases} clip_v2 :ref:dnnl_eltwise_clip_v2  :ref:dnnl_eltwise_clip_v2_use_dst_for_bwd  :math:d = \begin{cases} \beta & \text{if}\ s \geq \beta \geq \alpha \\ s & \text{if}\ \alpha < s < \beta \\ \alpha & \text{if}\ s \leq \alpha \end{cases} :math:ds = \begin{cases} dd & \text{if}\ \alpha < s < \beta \\ 0 & \text{otherwise}\ \end{cases} :math:ds = \begin{cases} dd & \text{if}\ \alpha < d < \beta \\ 0 & \text{otherwise}\ \end{cases} elu :ref:dnnl_eltwise_elu  :ref:dnnl_eltwise_elu_use_dst_for_bwd  :math:d = \begin{cases} s & \text{if}\ s > 0 \\ \alpha (e^s - 1) & \text{if}\ s \leq 0 \end{cases} :math:ds = \begin{cases} dd & \text{if}\ s > 0 \\ dd \cdot \alpha e^s & \text{if}\ s \leq 0 \end{cases} :math:ds = \begin{cases} dd & \text{if}\ d > 0 \\ dd \cdot (d + \alpha) & \text{if}\ d \leq 0 \end{cases}. See\ (2). exp :ref:dnnl_eltwise_exp  :ref:dnnl_eltwise_exp_use_dst_for_bwd  :math:d = e^s :math:ds = dd \cdot e^s :math:ds = dd \cdot d gelu_erf :ref:dnnl_eltwise_gelu_erf  :math:d = 0.5 s (1 + \operatorname{erf}[\frac{s}{\sqrt{2}}]) :math:ds = dd \cdot \left(0.5 + 0.5 \, \operatorname{erf}\left({\frac{s}{\sqrt{2}}}\right) + \frac{s}{\sqrt{2\pi}}e^{-0.5s^{2}}\right) gelu_tanh :ref:dnnl_eltwise_gelu_tanh  :math:d = 0.5 s (1 + \tanh[\sqrt{\frac{2}{\pi}} (s + 0.044715 s^3)]) :math:See\ (1). hardsigmoid :ref:dnnl_eltwise_hardsigmoid  :math:d = \text{max}(0, \text{min}(1, \alpha s + \beta)) :math:ds = \begin{cases} dd \cdot \alpha & \text{if}\ 0 < \alpha s + \beta < 1 \\ 0 & \text{otherwise}\ \end{cases} hardswish :ref:dnnl_eltwise_hardswish  :math:d = s \cdot \text{max}(0, \text{min}(1, \alpha s + \beta)) :math:ds = \begin{cases} dd & \text{if}\ \alpha s + \beta > 1 \\ dd \cdot (2 \alpha s + \beta) & \text{if}\ 0 < \alpha s + \beta < 1 \\ 0 & \text{otherwise} \end{cases} linear :ref:dnnl_eltwise_linear  :math:d = \alpha s + \beta :math:ds = \alpha \cdot dd log :ref:dnnl_eltwise_log  :math:d = \log_{e}{s} :math:ds = \frac{dd}{s} logistic :ref:dnnl_eltwise_logistic  :ref:dnnl_eltwise_logistic_use_dst_for_bwd  :math:d = \frac{1}{1+e^{-s}} :math:ds = \frac{dd}{1+e^{-s}} \cdot (1 - \frac{1}{1+e^{-s}}) :math:ds = dd \cdot d \cdot (1 - d) mish :ref:dnnl_eltwise_mish  :math:d = s \cdot \tanh{(\log_{e}(1+e^s))} :math:ds = dd \cdot \frac{e^{s} \cdot \omega}{\delta^{2}}. See\ (3). pow :ref:dnnl_eltwise_pow  :math:d = \alpha s^{\beta} :math:ds = dd \cdot \alpha \beta s^{\beta - 1} relu :ref:dnnl_eltwise_relu  :ref:dnnl_eltwise_relu_use_dst_for_bwd  :math:d = \begin{cases} s & \text{if}\ s > 0 \\ \alpha s & \text{if}\ s \leq 0 \end{cases} :math:ds = \begin{cases} dd & \text{if}\ s > 0 \\ \alpha \cdot dd & \text{if}\ s \leq 0 \end{cases} :math:ds = \begin{cases} dd & \text{if}\ d > 0 \\ \alpha \cdot dd & \text{if}\ d \leq 0 \end{cases}. See\ (2). round :ref:dnnl_eltwise_round  :math:d = round(s) soft_relu :ref:dnnl_eltwise_soft_relu  :math:d =\frac{1}{\alpha} \log_{e}(1+e^{\alpha s}) :math:ds = \frac{dd}{1 + e^{-\alpha s}} sqrt :ref:dnnl_eltwise_sqrt  :ref:dnnl_eltwise_sqrt_use_dst_for_bwd  :math:d = \sqrt{s} :math:ds = \frac{dd}{2\sqrt{s}} :math:ds = \frac{dd}{2d} square :ref:dnnl_eltwise_square  :math:d = s^2 :math:ds = dd \cdot 2 s swish :ref:dnnl_eltwise_swish  :math:d = \frac{s}{1+e^{-\alpha s}} :math:ds = \frac{dd}{1 + e^{-\alpha s}}(1 + \alpha s (1 - \frac{1}{1 + e^{-\alpha s}})) tanh :ref:dnnl_eltwise_tanh  :ref:dnnl_eltwise_tanh_use_dst_for_bwd  :math:d = \tanh{s} :math:ds = dd \cdot (1 - \tanh^2{s}) :math:ds = dd \cdot (1 - d^2) ============ ================================================================================================================================================================================================================================================================================================================ ============================================================================================================================================================ =========================================================================================================================================================================== ======================================================================================================================= :math:(1)\ ds = dd \cdot 0.5 (1 + \tanh[\sqrt{\frac{2}{\pi}} (s + 0.044715 s^3)]) \cdot (1 + \sqrt{\frac{2}{\pi}} (s + 0.134145 s^3) \cdot (1 - \tanh[\sqrt{\frac{2}{\pi}} (s + 0.044715 s^3)]) ) :math:(2)\ \text{Operation is supported only for } \alpha \geq 0. :math:(3)\ \text{where, } \omega = e^{3s} + 4 \cdot e^{2s} + e^{s} \cdot (4 \cdot s + 6) + 4 \cdot (s + 1) \text{ and } \delta = e^{2s} + 2 \cdot e^{s} + 2. Note that following equations hold: * :math:bounded\_relu(s, alpha) = clip(s, 0, alpha) * :math:logsigmoid(s) = soft\_relu(s, -1) * :math:hardswish(s, alpha, beta) = s \cdot hardsigmoid(s, alpha, beta) Difference Between Forward Training and Forward Inference +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ There is no difference between the :ref:dnnl_forward_training  and :ref:dnnl_forward_inference  propagation kinds. Backward -------- The backward propagation computes :math:\diffsrc based on :math:\diffdst and :math:\src tensors. However, some operations support a computation using :math:\dst memory produced during the forward propagation. Refer to the table above for a list of operations supporting destination as input memory and the corresponding formulas. Exceptions ++++++++++ The eltwise primitive with algorithm round does not support backward propagation. Execution Arguments ~~~~~~~~~~~~~~~~~~~ When executed, the inputs and outputs should be mapped to an execution argument index as specified by the following table. ============================== ================================================================================================================================================================= Primitive input/output Execution argument index ============================== ================================================================================================================================================================= :math:\src DNNL_ARG_SRC :math:\dst DNNL_ARG_DST :math:\diffsrc DNNL_ARG_DIFF_SRC :math:\diffdst DNNL_ARG_DIFF_DST :math:\text{binary post-op} :ref:DNNL_ARG_ATTR_MULTIPLE_POST_OP(binary_post_op_position)  | DNNL_ARG_SRC_1 ============================== ================================================================================================================================================================= Implementation Details ~~~~~~~~~~~~~~~~~~~~~~ General Notes ------------- #. All eltwise primitives have 3 primitive descriptor creation functions (e.g., :ref:dnnl::eltwise_forward::primitive_desc() ) which may take both :math:\alpha and :math:\beta, just :math:\alpha, or none of them. #. Both forward and backward propagation support in-place operations, meaning that :math:\src can be used as input and output for forward propagation, and :math:\diffdst can be used as input and output for backward propagation. In case of an in-place operation, the original data will be overwritten. Note, however, that some algorithms for backward propagation require original :math:\src, hence the corresponding forward propagation should not be performed in-place for those algorithms. Algorithms that use :math:\dst for backward propagation can be safely done in-place. #. For some operations it might be beneficial to compute backward propagation based on :math:\dst(\overline{s}), rather than on :math:\src(\overline{s}), for improved performance. #. For logsigmoid original formula :math:d = \log_{e}(\frac{1}{1+e^{-s}}) was replaced by :math:d = -soft\_relu(-s) for numerical stability. .. note:: For operations supporting destination memory as input, :math:\dst can be used instead of :math:\src when backward propagation is computed. This enables several performance optimizations (see the tips below). Data Type Support ----------------- The eltwise primitive supports the following combinations of data types: =================== ===================== ======================= Propagation Source / Destination Intermediate data type =================== ===================== ======================= forward / backward f32, bf16, f16 f32 forward s32 / s8 / u8 f32 =================== ===================== ======================= .. warning:: There might be hardware and/or implementation specific restrictions. Check :ref:Implementation Limitations  section below. Here the intermediate data type means that the values coming in are first converted to the intermediate data type, then the operation is applied, and finally the result is converted to the output data type. Data Representation ------------------- The eltwise primitive works with arbitrary data tensors. There is no special meaning associated with any logical dimensions. Post-Ops and Attributes ----------------------- ============ ======== ================================================================================= ================================================================================= ==================================== Propagation Type Operation Description Restrictions ============ ======== ================================================================================= ================================================================================= ==================================== Forward Post-op :ref:Binary  Applies a :ref:Binary  operation to the result General binary post-op restrictions ============ ======== ================================================================================= ================================================================================= ==================================== :target:doxid-dev_guide_eltwise_1dg_eltwise_impl_limits Implementation Limitations ~~~~~~~~~~~~~~~~~~~~~~~~~~ #. Refer to :ref:Data Types  for limitations related to data types support. #. GPU * Only tensors of 6 or fewer dimensions are supported. Performance Tips ~~~~~~~~~~~~~~~~ #. For backward propagation, use the same memory format for :math:\src, :math:\diffdst, and :math:\diffsrc (the format of the :math:\diffdst and :math:\diffsrc are always the same because of the API). Different formats are functionally supported but lead to highly suboptimal performance. #. Use in-place operations whenever possible (see caveats in General Notes). #. As mentioned above for all operations supporting destination memory as input, one can use the :math:\dst tensor instead of :math:\src. This enables the following potential optimizations for training: * Such operations can be safely done in-place. * Moreover, such operations can be fused as a :ref:post-op  with the previous operation if that operation does not require its :math:\dst to compute the backward propagation (e.g., if the convolution operation satisfies these conditions). Example ~~~~~~~ :ref:Eltwise Primitive Example  This C++ API example demonstrates how to create and execute an :ref:Element-wise  primitive in forward training propagation mode.