The convolution primitive computes forward, backward, or weight update for a batched convolution operation on 1D, 2D, or 3D spatial data with bias.
The convolution operation is defined by the following formulas. We show formulas only for 2D spatial data which are straightforward to generalize to cases of higher and lower dimensions. Variable names follow the standard Naming Conventions.
Let \(\src\), \(\weights\) and \(\dst\) be \(N \times IC \times IH \times IW\), \(OC \times IC \times KH \times KW\), and \(N \times OC \times OH \times OW\) tensors respectively. Let \(\bias\) be a 1D tensor with \(OC\) elements.
Furthermore, let the remaining convolution parameters be:
Parameter  Depth  Height  Width  Comment 

Padding: Front, top, and left  \(PD_L\)  \(PH_L\)  \(PW_L\)  In the API we use padding_l to indicate the corresponding vector of paddings (_l in the name stands for left) 
Padding: Back, bottom, and right  \(PD_R\)  \(PH_R\)  \(PW_R\)  In the API we use padding_r to indicate the corresponding vector of paddings (_r in the name stands for right) 
Stride  \(SD\)  \(SH\)  \(SW\)  Convolution without strides is defined by setting the stride parameters to 1 
Dilation  \(DD\)  \(DH\)  \(DW\)  Nondilated convolution is defined by setting the dilation parameters to 0 
The following formulas show how oneDNN computes convolutions. They are broken down into several types to simplify the exposition, but in reality the convolution types can be combined.
To further simplify the formulas, we assume that \(\src(n, ic, ih, iw) = 0\) if \(ih < 0\), or \(ih \geq IH\), or \(iw < 0\), or \(iw \geq IW\).
\[\dst(n, oc, oh, ow) = \bias(oc) \\ + \sum_{ic=0}^{IC1}\sum_{kh=0}^{KH1}\sum_{kw=0}^{KW1} \src(n, ic, oh \cdot SH + kh  PH_L, ow \cdot SW + kw  PW_L) \cdot \weights(oc, ic, kh, kw).\]
Here:
In the API, oneDNN adds a separate groups dimension to memory objects representing \(\weights\) tensors and represents weights as \(G \times OC_G \times IC_G \times KH \times KW \) 5D tensors for 2D convolutions with groups.
\[ \dst(n, g \cdot OC_G + oc_g, oh, ow) = \bias(g \cdot OC_G + oc_g) \\ + \sum_{ic_g=0}^{IC_G1}\sum_{kh=0}^{KH1}\sum_{kw=0}^{KW1} \src(n, g \cdot IC_G + ic_g, oh \cdot SH + kh  PH_L, ow \cdot SW + kw  PW_L) \cdot \weights(g, oc_g, ic_g, kh, kw), \]
where
The case when \(OC_G = IC_G = 1\) is also known as a depthwise convolution.
\[ \dst(n, oc, oh, ow) = \bias(oc) \\ + \sum_{ic=0}^{IC1}\sum_{kh=0}^{KH1}\sum_{kw=0}^{KW1} \src(n, ic, oh \cdot SH + kh \cdot (DH + 1)  PH_L, ow \cdot SW + kw \cdot (DW + 1)  PW_L) \cdot \weights(oc, ic, kh, kw). \]
Here:
Deconvolutions (also called fractionally strided convolutions or transposed convolutions) work by swapping the forward and backward passes of a convolution. One way to put it is to note that the weights define a convolution, but whether it is a direct convolution or a transposed convolution is determined by how the forward and backward passes are computed.
There is no difference between the dnnl_forward_training and dnnl_forward_inference propagation kinds.
The backward propagation computes \(\diffsrc\) based on \(\diffdst\) and \(\weights\).
The weights update computes \(\diffweights\) and \(\diffbias\) based on \(\diffdst\) and \(\src\).
When executed, the inputs and outputs should be mapped to an execution argument index as specified by the following table.
Primitive input/output  Execution argument index 

\(\src\)  DNNL_ARG_SRC 
\(\weights\)  DNNL_ARG_WEIGHTS 
\(\bias\)  DNNL_ARG_BIAS 
\(\dst\)  DNNL_ARG_DST 
\(\diffsrc\)  DNNL_ARG_DIFF_SRC 
\(\diffweights\)  DNNL_ARG_DIFF_WEIGHTS 
\(\diffbias\)  DNNL_ARG_DIFF_BIAS 
\(\diffdst\)  DNNL_ARG_DIFF_DST 
\(depthwise\)  DNNL_ARG_ATTR_POST_OP_DW 
\(binary postop\)  DNNL_ARG_ATTR_MULTIPLE_POST_OP(binary_post_op_position)  DNNL_ARG_SRC_1 
N/A.
Convolution primitive supports the following combination of data types for source, destination, and weights memory objects:
Propagation  Source  Weights  Destination  Bias 

forward / backward  f32  f32  f32  f32 
forward  f16  f16  f16  f16 
forward  u8, s8  s8  u8, s8, s32, f32  u8, s8, s32, f32 
forward  bf16  bf16  f32, bf16  f32, bf16 
backward  f32, bf16  bf16  bf16  
weights update  bf16  f32, bf16  bf16  f32, bf16 
Like other CNN primitives, the convolution primitive expects the following tensors:
Spatial  Source / Destination  Wei 

1D  \(N \times C \times W\)  \([G \times ] OC \times IC \times KW\) 
2D  \(N \times C \times H \times W\)  \([G \times ] OC \times IC \times KH \times KW\) 
3D  \(N \times C \times D \times H \times W\)  \([G \times ] OC \times IC \times KD \times KH \times KW\) 
Physical format of data and weights memory objects is critical for convolution primitive performance. In the oneDNN programming model, convolution is one of the few primitives that support the placeholder memory format tag dnnl::memory::format_tag::any (shortened to any
from now on) and can define data and weight memory objects format based on the primitive parameters. When using any
it is necessary to first create a convolution primitive descriptor and then query it for the actual data and weight memory objects formats.
While convolution primitives can be created with memory formats specified explicitly, the performance is likely to be suboptimal.
The table below shows the combinations for which plain memory formats the convolution primitive is optimized for.
Spatial  Convolution Type  Data / Weights logical tensor  Imp 

1D, 2D, 3D  any  optimized  
1D  f32, bf16  NCW / OIW, GOIW  dnnl_ncw (dnnl_abc) / dnnl_oiw (dnnl_abc), dnnl_goiw (dnnl_abcd) 
1D  "  "  dnnl_nwc (dnnl_acb) / dnnl_wio (dnnl_cba), dnnl_wigo (dnnl_dcab) 
1D  int8  NCW / OIW  dnnl_nwc (dnnl_acb) / dnnl_wio (dnnl_cba) 
2D  f32, bf16  NCHW / OIHW, GOIHW  dnnl_nchw (dnnl_abcd) / dnnl_oihw (dnnl_abcd), dnnl_goihw (dnnl_abcde) 
2D  "  "  dnnl_nhwc (dnnl_acdb) / dnnl_hwio (dnnl_cdba), dnnl_hwigo (dnnl_decab) 
2D  int8  NCHW / OIHW, GOIHW  dnnl_nhwc (dnnl_acdb) / dnnl_hwio (dnnl_cdba), dnnl_hwigo (dnnl_decab) 
3D  f32, bf16  NCDHW / OIDHW, GOIDHW  dnnl_ncdhw (dnnl_abcde) / dnnl_oidhw (dnnl_abcde), dnnl_goidhw (dnnl_abcdef) 
3D  "  "  dnnl_ndhwc (dnnl_acdeb) / dnnl_dhwio (dnnl_cdeba), dnnl_dhwigo (dnnl_defcab) 
3D  int8  NCDHW / OIDHW  dnnl_ndhwc (dnnl_acdeb) / dnnl_dhwio (dnnl_cdeba) 
Postops and attributes enable you to modify the behavior of the convolution primitive by applying the output scale to the result of the primitive and by chaining certain operations after the primitive. The following attributes and postops are supported:
Propagation  Type  Operation  Description  Restrictions 

forward  attribute  Output scale  Scales the result of convolution by given scale factor(s)  int8 convolutions only 
forward  attribute  Zero points  Sets zero point(s) for the corresponding tensors  int8 convolutions only 
forward  postop  Eltwise  Applies an Eltwise operation to the result  
forward  postop  Sum  Adds the operation result to the destination tensor instead of overwriting it  
forward  postop  Binary  Applies a Binary operation to the result  General binary postop restrictions 
To facilitate dynamic quantization, the primitive supports runtime output scales. That means a user could configure attributes with output scales set to the DNNL_RUNTIME_F32_VAL wildcard value instead of the actual scales, if the scales are not known at the primitive descriptor creation stage. In this case, the user must provide the scales as an additional input memory object with argument DNNL_ARG_ATTR_OUTPUT_SCALES
during the execution stage.
Similarly to runtime output scales, the primitive supports runtime zero points. The wildcard value for zero points is DNNL_RUNTIME_S32_VAL. During the execution stage, the corresponding memory object must be passed as an argument with its index set to (DNNL_ARG_ATTR_ZERO_POINTS  DNNL_ARG_${MEMORY_INDEX}
). Possible ${MEMORY_INDEX}
values are DNNL_ARG_SRC
and DNNL_ARG_DST
.
DNNL_ARG_ATTR_ZERO_POINTS  DNNL_ARG_SRC
).workspace
that is required to compute backward propagation correctly. Hence, in this particular case one should use separate convolution and eltwise primitives for training.The library supports any number and order of post operations, but only the following sequences deploy optimized code:
Type of convolutions  Pos 

f32 and bf16 convolution  eltwise, sum, sum > eltwise 
int8 convolution  eltwise, sum, sum > eltwise, eltwise > sum 
The attributes and postops take effect in the following sequence:
The operations during attributes and postops applying are done in single precision floating point data type. The conversion to the actual destination data type happens just before the actual storing.
Consider the following pseudocode:
The would lead to the following:
\[ \dst(\overline{x}) = \gamma \cdot \tanh \left( \alpha \cdot conv(\src, \weights) + \beta \cdot \dst(\overline{x}) \right) \]
The following pseudocode:
That would lead to the following:
\[ \dst(\overline{x}) = \beta \cdot \dst(\overline{x}) + \gamma \cdot ReLU \left( \alpha \cdot conv(\src, \weights), \eta \right) \]
The following pseudocode:
That would lead to the following:
\[ \dst(\overline{x}) = \gamma \cdot ReLU \left( \alpha \cdot conv(\src  shift_{src}, \weights), \eta \right) + shift_{dst} \]
oneDNN implements convolution primitives using several different algorithms:
oneDNN supports the direct convolution algorithm on all supported platforms for the following conditions:
any
).In case any of these constraints are not met, the implementation will silently fall back to an explicit GEMM algorithm.
oneDNN supports the Winograd convolution algorithm on systems with Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX512) support and Intel Deep Learning Boost (Intel DL Boost) under the following conditions:
any
as the data format).In case any of these constraints is not met, the implementation will silently fall back to the direct algorithm.
The Winograd convolution algorithm implementation additionally chooses tile size based on the problem shape and propagation kind:
forward_inference
oneDNN supports \(F(2 \times 2, 3 \times 3)\) or \(F(4 \times 4, 3 \times 3)\)The following side effects should be weighed against the (potential) performance boost achieved from using the Winograd algorithm:
Create a Winograd convolution by simply creating a convolution descriptor (step 6 in simple network example specifying the Winograd algorithm. The rest of the steps are exactly the same.
oneDNN supports dnnl::algorithm::convolution_auto
algorithm that instructs the library to automatically select the best algorithm based on the heuristics that take into account tensor shapes and the number of logical processors available. (For automatic selection to work as intended, use the same thread affinity settings when creating the convolution as when executing the convolution.)
Engine  Name  Com 

CPU/GPU  Convolution Primitive Example  This C++ API example demonstrates how to create and execute a Convolution primitive in forward propagation mode. Key optimizations included in this example:
