NOTE
Starting with version 1.1 Intel(R) MKL-DNN is renamed to DNNL. For consistency, only this guide uses Intel MKL-DNN nomenclature.
This article describes user-visible and some important internal changes to Intel MKL-DNN that occurred between v0.20 and v1.0.
The v0.x branch (mnt-v0) is deprecated and users are strongly encouraged to migrate to v1.x.
We tried to keep changes minimal to make migration as simple as possible. In particular, the Intel MKL-DNN programming model stays the same. Nevertheless, the new version brings a lot of incompatible changes requiring developers to revisit significant portions of the integrated code.
All changes can be split into the following groups:
These groups are discussed in detail below.
Deprecated functionality | Repl |
---|---|
ReLU primitive | Eltwise with algorithm kind ReLU |
ConvolutionReLU (single primitive) | Convolution with ReLU as a post operation |
Double precision scales | Single precision scales |
RNN backward pd w/o forward pd hint | RNN backward pd w/ forward pd hint |
mkldnn_omit_stats batch norm. flag | mkldnn_use_global_stats |
mkldnn_eltwise_desc_t.negative_slope | mkldnn_eltwise_desc_t.alpha |
mkldnn_rnn_cell_flags_t | Not available anymore – RNN primitives are separated into RNN, LSTM, and GRU |
mkldnn_padding_kind_t | Not used anymore |
The complete list of the removed C functions:
The complete list of the removed C++ classes and functions:
foo_v2()
to foo()
and remove old foo()
(C API only)The functions like:
were renamed to:
In v0.x, the foo_v2()
functions typically were used to pass attributes, and foo()
assumed empty attributes. In v1.0, the attributes parameter is mandatory. A user can still pass NULL
to indicate that the default (empty) attributes should be used.
The list of functions that had the _v2
suffix:
The experimental s16
data type is not supported any more and has been dropped.
Rounding mode that was a part of attributes has been dropped. All computations respect the MXCSR register when performing rounding. Unless the rounding mode is set explicitly, rounding to the nearest even integer (RNE) is used.
API | v0.x | v1. |
---|---|---|
C | mkldnn_batch_normalization_flag_t | mkldnn_normalization_flags_t |
C | mkldnn_format_t | mkldnn_format_tag_t |
C++ | mkldnn::batch_normalization_flag | mkldnn::normalization_flags |
C++ | mkldnn::memory::format | mkldnn::memory::format_tag |
API | v0.x | v1. |
---|---|---|
C | mkldnn_fuse_bn_relu | mkldnn_fuse_norm_relu |
C++ | mkldnn::fuse_bn_relu | mkldnn::normalization_flags::fuse_norm_relu |
C++ | mkldnn::query::eengine | mkldnn::query::engine |
API | v0.x | v1. |
---|---|---|
C | mkldnn_memory_desc_init() | mkldnn_memory_desc_init_by_tag() |
All enum
became enum class
. This requires the following changes:
Type | Value in v0.x | Val |
---|---|---|
mkldnn::prop_kind | mkldnn::forward_inference | mkldnn::prop_kind::forward_inference |
mkldnn::algorithm | mkldnn::eltwise_tanh | mkldnn::algorithm::eltwise_tanh |
mkldnn::normalization_flags | mkldnn::fuse_bn_norm_relu | mkldnn::normalization_flags::fuse_norm_relu |
mkldnn::query | mkldnn::eengine | mkldnn::query::engine |
mkldnn::memory::data_type | mkldnn::memory::f32 | mkldnn::memory::data_type::f32 |
mkldnn::memory::format_tag | mkldnn::memory::nchw | mkldnn::memory::format_tag::nchw |
Version 0.x had an implementation of view that was simply an alias for memory. In Intel MKL-DNN v1.0, we removed view as a type and replaced it with a memory descriptor directly. In order to initialize sub-memory, use mkldnn::memory::desc::submemory_desc().
Each type of RNN (Vanilla RNN, LSTM, and two types of GRU) is now initialized by a separate function/operation descriptor constructor.
For instance, instead of using mkldnn::rnn_forward with specified RNN types a user is expected to use:
Also, the hidden and cell states in LSTM are now separated. This means that instead of one src_iter
tensor of shape (layers, directions, states, batch, channels)
a user passes src_iter
tensor of shape (layers, directions, batch, channels)
for hidden states and src_iter_c
tensor of shape (layers, directions, batch, channels)
for cell states. The same applies to dst_iter
; the hidden state and the cell state are split into dst_iter
and dst_iter_c
respectively.
Intel MKL-DNN provides three GEMM-like functions:
With version 1.0 we switched from a Fortran-style to a C-style API, meaning that the parameters are passed by value rather than by address, and matrices are assumed to be in row-major format rather than column-major format.
Moreover, to broaden the applicability of integer matrix-matrix multiply functions we changed the formula from:
\[ C_{s32} = \alpha \cdot (op(A_{i8}) + o_A) \cdot (op(B_{s8}) + o_B) + \beta \cdot C_{s32} + o_C \]
to
\[ C_{s32} = \alpha \cdot (op(A_{i8}) - o_A) \cdot (op(B_{s8}) - o_B) + \beta \cdot C_{s32} + o_C \]
where for both mkldnn_gemm_u8s8s32() and mkldnn_gemm_s8s8s32() the types of offsets for matrices A and B correspond to the type of the matrices themselves; that is:
typeof(o_A) == typeof(*A)
andtypeof(o_B) == typeof(*B)
.In version 0.x when querying the primitive descriptor for a memory descriptor that is not used, the C API returned NULL and the C++ API threw an exception. In version 1.0, both the C and C++ APIs return a zero memory descriptor.
Zero memory descriptor means that the number of dimensions equals 0 and all the fields are set to zero. A memory object created with such a memory descriptor does not require any buffer allocations.
These changes enable simplifying the code that handles workspace or scratchpad:
In Intel MKL-DNN v1.0, all C++ objects (primitives, memory objects, engines, and streams) now have default empty constructors. This enables defining the object, and then initializing it later on. An attempt to use any methods of an uninitialized object will result in the throwing of an exception.
This improvement can be especially useful when Intel MKL-DNN objects are members of the user's classes. For example:
In Intel MKL-DNN v1.0, constructing a memory object using special value MKLDNN_MEMORY_ALLOCATE
for a handle results in the buffer being allocated by the library. This makes the behavior of the C API memory object constructor aligned with its C++ API mkldnn::memory
counterpart. Note that the C++ API memory object class still has an extra constructor that doesn't take a handle at all, and asks the library to allocate the buffer (that is, the same behavior as calling with the handle equal to MKLDNN_MEMORY_ALLOCATE
).
Intel MKL-DNN primitives may require temporary scratchpad memory for storing intermediate computational results. For instance, convolution backward by weights typically requires extra space to perform a reduction of the diff_weights
computed by different threads (the work is divided across images). Starting with version 1.0, the library supports two modes:
The former mode matches the behavior of Intel MKL-DNN v0.x. It is kept for user convenience and cases in which memory is not a concern.
In the explicit scratchpad mode, a new mkldnn_query_scratchpad_md
query will return the amount of scratchpad memory needed for a primitive, and the user will be responsible for allocating and providing the scratchpad memory to a primitive at runtime. The explicit scratchpad mode should be explicitly enabled by passing an attribute with mkldnn::scratchpad_mode::user
to primitive descriptors.
With explicit scratchpad it is possible to make Intel MKL-DNN primitives stateless and hence thread safe: the same primitive can be executed in multiple independent threads as long as different threads use different scratchpads.
However, if a user chooses implicit scratchpad mode, there is no thread-safety guarantee.
This is the most notable change in the library. The main idea was to change the execution API so that memory arguments are specified at primitive execution time and not at primitive creation time. This leads to the following changes.
In version 0.x, memory had a type of primitive. With the new API, memory becomes a distinct data type. Moreover, a memory primitive descriptor becomes redundant and has been dropped. The functions that use memory primitive descriptors now take memory descriptor and (optionally) engine, if the latter cannot be inferred.
These changes bring new data types and functions, such as:
Version 0.x allowed passing an operation primitive as an input to another primitive. For instance, a convolution primitive could be passed as an input to a consequent ReLU. During the execution the ReLU primitive queried the convolution for its output memory and used it as an input.
In version 1.0, users are allowed to pass only memory type as inputs and outputs for primitives.
mkldnn_primitive_at_t
typeAnother consequence is that mkldnn_primitive_at_t
, which is logically equivalent to {primitive, output_index}
, becomes redundant. Previously the type was used to specify the exact memory to use (if a primitive had several outputs).
Finally, users are now able to directly run primitives by calling an execute
function instead of putting primitives into a stream and running the latter. This change affects how primitives interact with streams and input/output memory objects: with the new API they become arguments to be passed to the primitive execution function.
The change significantly simplifies primitive creation, which now requires a primitive descriptor only:
To remove the ambiguity in which order input and output memories need to be passed, we introduced a map-like argument in which each memory argument is paired with a tag indicating what kind of argument it is: destination, source, weights, and so on.
The example below shows conceptual code transformations between versions. The C++ API is used for brevity.
#### Version 0.x:
#### Version 1.0:
The way of describing memory format in version 0.x had multiple issues. From the user's perspective, the main issues were:
iohw
format was not available.oihw
described memory in the same way as nchw
, but these formats were different (see gh#153).There were more substantial issues from the library development perspective: code bloat to support special cases, etc.
We addressed the issues above by reworking memory descriptors. From the user's perspective, the main changes are:
strides={h*w, o*h*w, w, 1}
should be a valid way to define iohw
format even if Intel MKL-DNN does not support it explicitly. Functions to use:
int64_t
instead of int, and the maximum number of tensor dimensions is decreased from 16 to 12. The mkldnn_strides_t
is removed; use mkldnn_dims_t
instead.memory_desc_t.format
field is replaced with memory_desc_t.format_kind
, which also has different semantics.While the first two items are self-explanatory, the last one requires some elaboration.
In version 0.x, most memory formats could be described directly by using appropriate format names (for example, nchw
) that fully describe how data is laid out in memory. However, Intel MKL-DNN also had the blocked
memory format and the corresponding memory_desc_t.layout_desc.blocking_desc
structure, which could describe a memory format in a unified fashion by specifying block sizes and strides. The original idea was to use format tags like nchw
during memory descriptor initialization only, and always use the blocked
format internally. Unfortunately, that was never implemented.
With the new design, Intel MKL-DNN starts distinguishing between the actual memory format and convenience memory format tags that can be used to describe memory format concisely.
Users are still able to initialize memory descriptors with format tags like nchw
using mkldnn::memory::desc::desc(dims, data_type, format_tag) or mkldnn_memory_desc_init_by_tag(), but the memory_desc_t.format_kind
is set to a canonicalized kind like blocked
, and the format name is not recorded in the memory descriptor structure. Initialization with strides will always result in blocked
format. The API also uses different types for memory format tags and kinds to aid correctness.
For more details, refer to the https://github.com/intel/mkl-dnn/blob/rfc-api-changes-v1.0/doc/rfc/api-v1.0/rfc_memory_desc.md "Memory descriptor article" of the RFC for v1.0.
The build options were slightly changed in the new version of Intel MKL-DNN. That was done mainly to avoid name collisions with other projects that include Intel MKL-DNN as a subproject and to accommodate future extensions to the library. The change are:
Old option | New option | Notes |
---|---|---|
WITH_EXAMPLE | MKLDNN_BUILD_EXAMPLES | |
WITH_TEST | MKLDNN_BUILD_TESTS | |
MKLDNN_THREADING | MKLDNN_CPU_RUNTIME | |
MKLDNN_USE_MKL | N/A | Intel MKL-DNN does not use Intel MKL anymore |
VTUNEROOT | N/A | Not required, as Intel MKL-DNN contains all the necessary code internally |
By default, the -Werror
flag is disabled. MKLDNN_WERROR
controls the behavior.
For more information about build options, refer to Build Options.