It is often useful to collect information about how much of an application runtime is spent executing Intel(R) Math Kernel Library for Deep Neural Networks (DNNL) primitives and which of those take the most time. DNNL's verbose mode enables tracing execution of DNNL primitives and collection of basic statistics like execution time and primitive parameters.
The behavior is controlled with DNNL_VERBOSE
environment variable or dnnl_set_verbose function.
Value | Behav |
0 | no verbose output (default) |
1 | primitive information at execution |
2 | primitive information at creation and execution |
The function setting takes precedence over the environment variable.
The first line of verbose information contains the build version and git hash, if available, as well as the supported instruction set architecture.
Each subsequent line of verbose information is formatted as a comma-separated list containing:
dnnl_verbose
marker string
- operation:
create[:cache_hit]
, create[:cache_miss]
or exec
- engine kind:
cpu
or gpu
- primitive name:
convolution
, reorder
, sum
, etc
- primitive implementation
- propagation:
forward_training
, forward_inference
, or backward
- information about input and output data types and formats
- auxiliary information like algorithm name or number of inputs
- a problem description in benchdnn format
- execution time in milliseconds
Example
DNNL_VERBOSE=1 ./benchdnn --conv ic16ih7oc16oh7kh5ph2n"wip"
This produces the following output (the line break was added to fit the page width):
dnnl_verbose,info,DNNL v0.95.0 (Git Hash ce116d48579332ae7e51a46219114f8d3c1e48db),Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2)
dnnl_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:abcd:f0 dst_f32::blocked:aBcd8b:f0,num:1,2x16x7x7,0.468994
dnnl_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:abcd:f0 dst_f32::blocked:ABcd8b8a:f0,num:1,16x16x5x5,0.458008
dnnl_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:abcd:f0 dst_f32::blocked:aBcd8b:f0,num:1,2x16x7x7,0.453857
dnnl_verbose,exec,cpu,reorder,simple:any,undef,src_f32::blocked:a:f0 dst_f32::blocked:a:f0,num:1,16,0.462891
dnnl_verbose,exec,cpu,convolution,jit:avx2,forward_training,
src_f32::blocked:aBcd8b:f0 wei_f32::blocked:ABcd8b8a:f0 bia_f32::blocked:a:f0 dst_f32::blocked:aBcd8b:f0,
alg:convolution_direct,mb2_ic16oc16_ih7oh7kh5sh1dh0ph2_iw7ow7kw5sw1dw0pw2,0.026123
dnnl_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:aBcd8b:f0 dst_f32::blocked:abcd:f0,num:1,2x16x7x7,0.464111
Please see the profiling example here, as it uses DNNL_VERBOSE output to tune DNNL code to align with best practices.
- Warning
- Verbose mode has non-negligible performance impact especially if the output rate is high.