Naming Conventions¶
oneDNN documentation relies on a set of standard naming conventions for variables. This section describes these conventions.
Variable (Tensor) Names¶
Neural network models consist of operations of the following form:
where \(\dst\) and \(\src\) are activation tensors, and \(\weights\) are learnable tensors.
The backward propagation consists then in computing the gradients with respect to the \(\src\) and \(\weights\) respectively:
and
While oneDNN uses src, dst, and weights as generic names for the activations and learnable tensors, for a specific operation there might be commonly used and widely known specific names for these tensors. For instance, the convolution operation has a learnable tensor called bias. For usability reasons, oneDNN primitives use such names in initialization or other functions to simplify the coding.
To summarize, oneDNN uses the following commonly used notations for tensors:
Name |
Meaning |
---|---|
|
Source tensor |
|
Destination tensor |
|
Weights tensor |
|
Bias tensor (used in Convolution , Inner Product and other primitives) |
|
Scale and shift tensors (used in Batch Normalization , Layer Normalization and Group Normalization ) |
|
Workspace tensor that carries additional information from the forward propagation to the backward propagation |
|
Temporary tensor that is required to store the intermediate results |
|
Gradient tensor with respect to the source |
|
Gradient tensor with respect to the destination |
|
Gradient tensor with respect to the weights |
|
Gradient tensor with respect to the bias |
|
Gradient tensor with respect to the scale and shift |
|
RNN layer data or weights tensors |
|
RNN recurrent data or weights tensors |
Formulas and Verbose Output¶
oneDNN uses the following notations in the documentation formulas and verbose output. Here, lower-case letters are used to denote indices in a particular spatial dimension, the sizes of which are denoted by corresponding upper-case letters.
Name |
Semantics |
---|---|
|
batch |
|
groups |
|
output channels, depth, height, and width |
|
input channels, depth, height, and width |
|
kernel (filter) depth, height, and width |
|
stride by depth, height, and width |
|
dilation by depth, height, and width |
|
padding by depth, height, and width |
RNN-Specific Notation¶
The following notations are used when describing RNN primitives.
Name |
Semantics |
---|---|
\(\cdot\) |
matrix multiply operator |
\(*\) |
element-wise multiplication operator |
W |
input weights |
U |
recurrent weights |
\(^T\) |
transposition |
B |
bias |
h |
hidden state |
a |
intermediate value |
x |
input |
\(_t {}_{}\) |
timestamp |
\(l\) |
layer index |
activation |
tanh, relu, logistic |
c |
cell state |
\(\tilde{c}\) |
candidate state |
i |
input gate |
f |
forget gate |
o |
output gate |
u |
update gate |
r |
reset gate |