Primitives

DNNL is built around the notion of a primitive (dnnl::primitive). A primitive is a functor object that encapsulates a particular computation such as forward convolution, backward LSTM computations, or a data transformation operation. A single primitive can sometimes represent more complex fused computations such as a forward convolution followed by a ReLU.

The most important difference between a primitive and a pure function is that a primitive can store state.

One part of the primitive’s state is immutable. For example, convolution primitives store parameters like tensor shapes and can pre-compute other dependent parameters like cache blocking. This approach allows DNNL primitives to pre-generate code specifically tailored for the operation to be performed. The DNNL programming model assumes that the time it takes to perform the pre-computations is amortized by reusing the same primitive to perform computations multiple times.

The mutable part of the primitive’s state is referred to as a scratchpad. It is a memory buffer that a primitive may use for temporary storage only during computations. The scratchpad can either be owned by a primitive object (which makes that object non-thread safe) or be an execution-time parameter.

Engines

Engines (dnnl::engine) is an abstraction of a computational device: a CPU, a specific GPU card in the system, etc. Most primitives are created to execute computations on one specific engine. The only exceptions are reorder primitives that transfer data between two different engines.

Streams

Streams (dnnl::stream) encapsulate execution context tied to a particular engine. For example, they can correspond to OpenCL command queues.

Memory Objects

Memory objects (dnnl::memory) encapsulate handles to memory allocated on a specific engine, tensor dimensions, data type, and memory format – the way tensor indices map to offsets in linear memory space. Memory objects are passed to primitives during execution.

Levels of Abstraction

DNNL has multiple levels of abstractions for primitives and memory objects in order to expose maximum flexibility to its users.

On the logical level, the library provides the following abstractions:

Memory descriptors (dnnl::memory::desc) define a tensor's logical dimensions, data type, and the format in which the data is laid out in memory. The special format any (dnnl::memory::format_tag::any) indicates that the actual format will be defined later (see Memory format propagation).
Operation descriptors (one for each supported primitive) describe an operation's most basic properties without specifying, for example, which engine will be used to compute them. For example, convolution descriptor describes shapes of source, destination, and weights tensors, propagation kind (forward, backward with respect to data or weights), and other implementation-independent parameters.
Primitive descriptors (dnnl_primitive_desc_t; in the C++ API there are multiple types for each supported primitive) are at an abstraction level in between operation descriptors and primitives and can be used to inspect details of a specific primitive implementation like expected memory formats via queries to implement memory format propagation (see Memory format propagation) without having to fully instantiate a primitive.

Abstraction level	Memory object	Primitive objects
Logical description	Memory descriptor	Operation descriptor
Intermediate description	N/A	Primitive descriptor
Implementation	Memory object	Primitive

Creating Memory Objects and Primitives

Memory Objects

Memory objects are created from the memory descriptors. It is not possible to create a memory object from a memory descriptor that has memory format set to dnnl::memory::format_tag::any.

There are two common ways for initializing memory descriptors:

By using dnnl::memory::desc constructors or by extracting a descriptor for a part of a tensor via dnnl::memory::desc::submemory_desc
By querying an existing primitive descriptor for a memory descriptor corresponding to one of the primitive's parameters (for example, dnnl::convolution_forward::primitive_desc::src_desc).

Memory objects can be created with a user-provided handle (a void * on CPU), or without one, in which case the library will allocate storage space on its own.

Primitives

The sequence of actions to create a primitive is:

Create an operation descriptor via, for example, dnnl::convolution_forward::desc. The operation descriptor can contain memory descriptors with placeholder format_tag::any memory formats if the primitive supports it.
Create a primitive descriptor based on the operation descriptor and an engine handle.
Create a primitive based on the primitive descriptor obtained in step 2.