Some primitives might require a temporary buffer while performing their computations. For instance, the operations that do not have enough independent work to utilize all cores on a system might use parallelization over the reduction dimension (the K dimension in the GEMM notation). In this case different threads compute partial results in private temporary buffers, and then the private results are added to produce the final result. Another example is using matrix multiplication (GEMM) to implement convolution. Before calling GEMM, the source activations must be transformed using the im2col operation. The transformation result is written to a temporary buffer that is then used as an input for the GEMM.
In both of these examples, the temporary buffer is no longer required once the primitive computation is completed. oneDNN refers to such a memory buffer as a scratchpad.
The amount of space required for the scratchpad depends on the primitive and its actual implementation. For example, the GEMM-based convolution requires a scratchpad for the im2col
data, while the direct convolution does not.
Both types of implementation might need extra space for the reduction in case there are too few independent tasks. The amount of memory required by the im2col
transformation is proportional to the size of the source image multiplied by the weights spatial size. The size of a buffer for reduction is proportional to the tensor size to be reduced (e.g., diff_weights
in the case of backward by weights) multiplied by the number of threads in the reduction groups (the upper bound is the total number of threads).
By contrast, some other primitives might require very little extra space. For instance, one of the implementation of the dnnl::sum primitive requires temporary space only to store the pointers to data for each and every input array (that is, the size of the scratchpad is n * sizeof(void *)
, where n
is the number of summands).
oneDNN supports two modes for handling scratchpads:
DNNL_ARG_SCRATCHPAD
tag). This enables the user to reuse the memory as well as to make the primitives thread-safe. However, this requires a good memory manager (in terms of speed and locality) on the user's side.The scratchpad mode is controlled though the dnnl_primitive_attr_set_scratchpad_mode (C API) and dnnl::primitive_attr::set_scratchpad_mode (C++ API) primitive attributes.
All primitives support both scratchpad modes.
If the user provides scratchpad memory to a primitive, this memory must be created using the same engine that the primitive uses.
As mentioned above, this is a default behavior. We only want to highlight how a user can query the amount of memory consumed by a primitive due to a scratchpad.