It is often useful to collect information about how much of an application runtime is spent executing Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-DNN) primitives and which of those take the most time. Intel MKL-DNN's verbose mode enables tracing execution of Intel MKL-DNN primitives and collection of basic statistics like execution time and primitive parameters.

The behavior is controlled with MKLDNN_VERBOSE environment variable or mkldnn_set_verbose function.

Value	Behav
0	no verbose output (default)
1	primitive information at execution
2	primitive information at creation and execution

The function setting takes precedence over the environment variable.

The first line of verbose information contains the build version and git hash, if available, as well as the supported instruction set architecture.

Each subsequent line of verbose information is formatted as a comma-separated list containing:

mkldnn_verbose marker string
operation: create or exec
engine kind: cpu or gpu
primitive name: convolution, reorder, sum, etc
primitive implementation
propagation: forward_training, forward_inference, or backward
information about input and output data types and formats
auxiliary information like algorithm name or number of inputs
a problem description in benchdnn format
execution time in milliseconds

Example

MKLDNN_VERBOSE=1 ./benchdnn --conv ic16ih7oc16oh7kh5ph2n"wip"

This produces the following output (the line break was added to fit the page width):

mkldnn_verbose,info,Intel(R) MKL-DNN v0.95.0 (Git Hash ce116d48579332ae7e51a46219114f8d3c1e48db),Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2)
mkldnn_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:abcd:f0 dst_f32::blocked:aBcd8b:f0,num:1,2x16x7x7,0.468994
mkldnn_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:abcd:f0 dst_f32::blocked:ABcd8b8a:f0,num:1,16x16x5x5,0.458008
mkldnn_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:abcd:f0 dst_f32::blocked:aBcd8b:f0,num:1,2x16x7x7,0.453857
mkldnn_verbose,exec,cpu,reorder,simple:any,undef,src_f32::blocked:a:f0 dst_f32::blocked:a:f0,num:1,16,0.462891
mkldnn_verbose,exec,cpu,convolution,jit:avx2,forward_training,
    src_f32::blocked:aBcd8b:f0 wei_f32::blocked:ABcd8b8a:f0 bia_f32::blocked:a:f0 dst_f32::blocked:aBcd8b:f0,
    alg:convolution_direct,mb2_ic16oc16_ih7oh7kh5sh1dh0ph2_iw7ow7kw5sw1dw0pw2,0.026123
mkldnn_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:aBcd8b:f0 dst_f32::blocked:abcd:f0,num:1,2x16x7x7,0.464111

Please see the profiling example here, as it uses MKLDNN_VERBOSE output to tune Intel MKL-DNN code to align with best practices.

Warning: Verbose mode has non-negligible performance impact especially if the output rate is high.