oneDNN provides performance critical primitives to accelerate operations used both during training deep learning models and during the operations performed when the models are used for inference.
During inference, the input data is fed into the trained model which in turn produces a result (e.g. makes a prediction). This process is usually called forward propagation and corresponds to the dnnl::prop_kind::forward_inference propagation kind in oneDNN.
Training usually consists of the following steps.
_training
here versus _inference
mentioned above. The differences are covered below in the corresponding section below.diff_src
from diff_dst
(see Naming Conventions). This step corresponds to the dnnl::prop_kind::backward_data propagation kind;diff_weights
from diff_dst
. This step makes sense only for the operations that have learnable parameters and corresponds to the dnnl::prop_kind::backward_weights propagation kind.Even though, mathematically, the forward propagation that happens during training and inference should be the same, in practice there are some differences mostly due to the performance considerations.
When executing inference, one may not care about values in the intermediate buffers during a model execution; hence one can reuse them as desired. However, if this is a forward propagation of a training it is beneficial to preserve input data, output data, or sometimes some intermediate data, that will later be used at the backward propagation to compute the gradients.
For example, let's take max pooling (Pooling with algorithm kind dnnl::algorithm::pooling_max) as an example. The forward pass consists of computing the maximum values in the sliding window over the source tensor. Hence the output is just another tensor that contain these maximum values. However, in order to compute source gradient on backward propagation one needs to know the position of these maximum values in the source tensor. Of course, it is possible to use the original source tensor to locate the maximums again, but this might be more expensive compared to preserving the positions of the maximum values in another tensor, that will be then used during the backward propagation. oneDNN uses the latter approach: for max pooling primitive when the propagation kind is set to dnnl::prop_kind::forward_training the library produces one extra output called Workspace which will be covered later in this document.
As mentioned above, oneDNN separates error back-propagation with respect to data and error back-propagation with respect to weights. The former corresponds to dnnl::prop_kind::backward_data, while the latter corresponds to dnnl::prop_kind::backward_weights (for example: Convolution).
The following list outlines the key specifics of running inference with oneDNN:
Most of these techniques are shown in the following examples:
The following list outlines the key specifics of running training with oneDNN:
diff_src
and diff_weights
at the same time. To highlight this behavior, the propagation kind is set to dnnl::prop_kind::backward.diff_src
, one must pass diff_dst
memory and the original src
memory, which was exactly the intermediate one.)diff_dst
and src
, but to compute backward propagation for Softmax, one needs to pass diff_dst
and dst
. Check the documentation for each primitive to see what is required for each particular primitive.src
memory format of a convolution on forward propagation will always match the src
memory format of the corresponding convolution on backward by weights propagation. Of course, the library tries to avoid unnecessary reorder, so in most cases the formats will be the same, but this would by no means always be true.diff_dst
in the same memory format as the original dst
. The mismatch of the formats would lead to significant performance issues. To ensure the proper format, users should always use dnnl::memory::format_tag::any memory format for gradient tensors (diff_dst
, diff_src
). If a primitive requires original data tensors (e.g. src
in Eltwise or dst
in Softmax) user must pass fully defined memory descriptor for these tensors. In other words src
and dst
memory descriptors cannot be initialized with dnnl::memory::format_tag::any for backward propagation. Based on the format of the original tensors, if any, and on forward primitive descriptor hint (see bullet 9 below) a primitive picks the proper format for the gradients. Occasionally, it might appear that the diff_dst
that comes in is in other memory format than the primitive requires, hence robust integration code must be prepared to emit a reorder.diff_dst
to have the same memory format as dst
, though this is not recommended.workspace
, because it might be different for different implementations.Most of these techniques are shown in the following examples:
oneDNN uses the notion of workspace
for some very particular cases. Specifically, the workspace
is a tensor that the primitive fills in during forward propagation and that will then be used by the corresponding backward propagation operation. The example with max pooling was already discussed above.
The workflow for using workspace is:
.workspace_desc()
.dnnl::memory::desc()
or for which dnnl::memory::desc::get_size() returns 0), no extra action is required–the workspace is not required for this primitive in this configuration.DNNL_ARG_WORKSPACE
tag.workspace
memory of zero size and follow the logic where the workspace is indeed required. Such an approach may simplify the integration because the common pass is used.