.. index:: pair: page; Inner Product .. _doxid-dev_guide_inner_product: Inner Product ============= :ref:API Reference  General ~~~~~~~ The inner product primitive (sometimes called fully connected) treats each activation in the minibatch as a vector and computes its product with a weights 2D tensor producing a 2D tensor as an output. Forward ------- More precisely, let :math:\src, :math:\weights, :math:\bias and :math:\dst be :math:N \times IC, :math:OC \times IC, :math:OC, and :math:N \times OC tensors, respectively (variable names follow the standard :ref:Naming Conventions ). Then: .. math:: \dst(n, oc) = \bias(oc) + \sum_{ic=0}^{IC-1} \src(n, ic) \cdot \weights(oc, ic) In cases where the :math:\src and :math:\weights tensors have spatial dimensions, they are flattened to 2D. For example, if they are 4D :math:N \times IC' \times IH \times IW and :math:OC \times IC' \times KH \times KW tensors, then the formula above is applied with :math:IC = IC' \cdot IH \cdot IW. In such cases, the :math:\src and :math:\weights tensors must have equal spatial dimensions (e.g. :math:KH = IH and :math:KW = IW for 4D tensors). Difference Between Forward Training and Forward Inference +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ There is no difference between the :ref:dnnl::prop_kind::forward_training  and :ref:dnnl::prop_kind::forward_inference  propagation kinds. Backward -------- The backward propagation computes :math:\diffsrc based on :math:\diffdst and :math:\weights. The weights update computes :math:\diffweights and :math:\diffbias based on :math:\diffdst and :math:\src. .. note:: The optimized memory formats :math:\src and :math:\weights might be different on forward propagation, backward propagation, and weights update. Execution Arguments ~~~~~~~~~~~~~~~~~~~ When executed, the inputs and outputs should be mapped to an execution argument index as specified by the following table. ============================== ================================================================================================================================================================= Primitive input/output Execution argument index ============================== ================================================================================================================================================================= :math:\src DNNL_ARG_SRC :math:\weights DNNL_ARG_WEIGHTS :math:\bias DNNL_ARG_BIAS :math:\dst DNNL_ARG_DST :math:\diffsrc DNNL_ARG_DIFF_SRC :math:\diffweights DNNL_ARG_DIFF_WEIGHTS :math:\diffbias DNNL_ARG_DIFF_BIAS :math:\diffdst DNNL_ARG_DIFF_DST :math:\text{binary post-op} :ref:DNNL_ARG_ATTR_MULTIPLE_POST_OP(binary_post_op_position)  | DNNL_ARG_SRC_1 ============================== ================================================================================================================================================================= Implementation Details ~~~~~~~~~~~~~~~~~~~~~~ General Notes ------------- N/A. Data Types ---------- Inner product primitive supports the following combination of data types for source, destination, weights, and bias: =================== ========== ========== ======================= ======================= Propagation Source Weights Destination Bias =================== ========== ========== ======================= ======================= forward / backward f32 f32 f32 f32 forward f16 f16 f32, f16, u8, s8 f16, f32 forward u8, s8 s8 u8, s8, s32, bf16, f32 u8, s8, s32, bf16, f32 forward bf16 bf16 f32, bf16 f32, bf16 backward f32, bf16 bf16 bf16 backward f32, f16 f16 f16 weights update bf16 f32, bf16 bf16 f32, bf16 weights update f16 f32, f16 f16 f32, f16 =================== ========== ========== ======================= ======================= Data Representation ------------------- Like other CNN primitives, the inner product primitive expects the following tensors: ======== ============================================== =================== =================================================== Spatial Source Destination Weights ======== ============================================== =================== =================================================== 1D :math:N \times C \times W :math:N \times C :math:OC \times IC \times KW 2D :math:N \times C \times H \times W :math:N \times C :math:OC \times IC \times KH \times KW 3D :math:N \times C \times D \times H \times W :math:N \times C :math:OC \times IC \times KD \times KH \times KW ======== ============================================== =================== =================================================== Memory format of data and weights memory objects is critical for inner product primitive performance. In the oneDNN programming model, inner product primitive is one of the few primitives that support the placeholder format :ref:dnnl::memory::format_tag::any  (shortened to any from now on) and can define data and weight memory objects formats based on the primitive parameters. When using any it is necessary to first create an inner product primitive descriptor and then query it for the actual data and weight memory objects formats. The table below shows the combinations for which plain memory formats the inner product primitive is optimized for. For the destination tensor (which is always :math:N \times C) the memory format is always :ref:dnnl::memory::format_tag::nc  (:ref:dnnl::memory::format_tag::ab ). ======== ================================ ============================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================== Spatial Source / Weights logical tensor Implementation optimized for memory formats ======== ================================ ============================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================== 0D NC / OI :ref:dnnl_nc  ( :ref:dnnl_ab  ) / :ref:dnnl_oi  ( :ref:dnnl_ab  ) 0D NC / OI :ref:dnnl_nc  ( :ref:dnnl_ab  ) / :ref:dnnl_io  ( :ref:dnnl_ba  ) 1D NCW / OIW :ref:dnnl_ncw  ( :ref:dnnl_abc  ) / :ref:dnnl_oiw  ( :ref:dnnl_abc  ) 1D NCW / OIW :ref:dnnl_nwc  ( :ref:dnnl_acb  ) / :ref:dnnl_wio  ( :ref:dnnl_cba  ) 2D NCHW / OIHW :ref:dnnl_nchw  ( :ref:dnnl_abcd  ) / :ref:dnnl_oihw  ( :ref:dnnl_abcd  ) 2D NCHW / OIHW :ref:dnnl_nhwc  ( :ref:dnnl_acdb  ) / :ref:dnnl_hwio  ( :ref:dnnl_cdba  ) 3D NCDHW / OIDHW :ref:dnnl_ncdhw  ( :ref:dnnl_abcde  ) / :ref:dnnl_oidhw  ( :ref:dnnl_abcde  ) 3D NCDHW / OIDHW :ref:dnnl_ndhwc  ( :ref:dnnl_acdeb  ) / :ref:dnnl_dhwio  ( :ref:dnnl_cdeba  ) ======== ================================ ============================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================== Post-Ops and Attributes ----------------------- Post-ops and attributes enable you to modify the behavior of the inner product primitive by chaining certain operations after the inner product operation. The following post-ops are supported by inner product primitives: ============ ========== ================================================================================== ==================================================================================== ==================================== Propagation Type Operation Description Restrictions ============ ========== ================================================================================== ==================================================================================== ==================================== forward attribute Output scale Scales the result of inner product by given scale factor(s) int8 inner products only forward post-op :ref:Eltwise  Applies an :ref:Eltwise  operation to the result forward post-op :ref:Sum  Adds the operation result to the destination tensor instead of overwriting it forward post-op :ref:Binary  Applies a :ref:Binary  operation to the result General binary post-op restrictions ============ ========== ================================================================================== ==================================================================================== ==================================== To facilitate dynamic quantization, the primitive supports run-time output scales. That means a user could configure attributes with output scales set to the :ref:DNNL_RUNTIME_F32_VAL  wildcard value instead of the actual scales, if the scales are not known at the primitive descriptor creation stage. In this case, the user must provide the scales as an additional input memory object with argument DNNL_ARG_ATTR_OUTPUT_SCALES during the execution stage. Implementation Limitations ~~~~~~~~~~~~~~~~~~~~~~~~~~ #. Check :ref:Data Types . #. The CPU engine does not support u8 or s8 data type for dst with f16 src and weights. Performance Tips ~~~~~~~~~~~~~~~~ * Use :ref:dnnl::memory::format_tag::any  for source, weights, and destinations memory format tags when create an inner product primitive to allow the library to choose the most appropriate memory format. Example ~~~~~~~ :ref:Inner Product Primitive Example  This C++ API example demonstrates how to create and execute an :ref:Inner Product  primitive. Key optimizations included in this example: * Primitive attributes with fused post-ops; * Creation of optimized memory format from the primitive descriptor.