Graph Compiler¶
oneDNN Graph Compiler is an experimental backend for oneDNN Graph API. It can generate optimized implementations for complex computational graphs including multi-head attention (MHA), multi-layer perceptron (MLP), and convolution residual blocks over typical data types for both inference and training. It also brings improved performance by providing more flexible operator fusion.
Use of oneDNN Graph Compiler is transparent for applications, as it does not involve API or programming model changes.
Build-Time Controls¶
The following build time options only work when ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_BACKEND
is ON.
CMake Option |
Supported values (defaults in bold) |
Description |
---|---|---|
ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_CPU_JIT |
llvm, c, builtin |
Selects the CPU codegen and JIT to be built by graph compiler backend. Multiple codegen approaches can be used simultaneously. See the example for setting multiple codegen methods. |
ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_CPU_LLVM_CONFIG |
AUTO , path to llvm-config binary |
Defines the method for detecting and configuring LLVM. |
Codegen and JIT Options¶
Graph compiler backend supports several different codegen and JIT options including C, LLVM, and builtin (xbyak). Users can choose to build a subset of available options by setting the ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_CPU_JIT
option.
cmake .. -DONEDNN_BUILD_GRAPH=ON -DONEDNN_EXPERIMENTAL_GRAPH_COMPILER_BACKEND=ON -DONEDNN_EXPERIMENTAL_GRAPH_COMPILER_CPU_JIT="c;builtin"
This will only build c
and builtin
codegen options.
cmake .. -DONEDNN_BUILD_GRAPH=ON -DONEDNN_EXPERIMENTAL_GRAPH_COMPILER_BACKEND=ON -DONEDNN_EXPERIMENTAL_GRAPH_COMPILER_CPU_JIT="llvm;c;builtin"
This will build all three codegen options.
C¶
C codegen generates temporary cpp files and adopts g++
to compile them into the executable. It can be used for debugging purposes as the generated code is more friendly and readable to developers.
LLVM¶
LLVM codegen generates LLVM-IR in memory. It provides the best performance among all supported codegen methods. When LLVM codegen is chosen, extra LLVM dependency is required. If LLVM does not exist in this case, a CMake error will occur.
Users can set ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_CPU_LLVM_CONFIG
to specify the LLVM to be integrated. By default, ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_CPU_LLVM_CONFIG
is set to AUTO
, which auto-detects existing LLVM in the environment. If auto-detection fails or user wants to explicitly specify the version of LLVM, a specific path to llvm-config binary shall be set.
Users can follow the guidelines to build and install LLVM from source, or download and install the pre-built binary from here.
Note
LLVM 10.0 or above is required to enable LLVM codegen.
Builtin¶
Builtin codegen and JIT method is implemented with xbyak technology inside. Compared with C or LLVM codegen, it has no extra dependency.
Environment Variables¶
The following environment variables are introduced by the graph compiler backend.
Environment Variable |
Value |
Description |
---|---|---|
ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_CPU_JIT |
llvm builtin c |
Uses LLVM as codegen and JIT method Uses builtin as codegen and JIT method Uses C as codegen and JIT method |
ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_OPT_LEVEL |
0 1,2, 3 |
Turns off optimization passes and sets the compilation optimization level to be 0 in C and LLVM JIT Sets the compilation optimization level of C and LLVM JIT |
ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_KERNEL_TRACE |
0 1, stderr or filename.json |
No kernel execution trace output Generates kernel execution trace to the file specified by the given filename with chrome tracing format |
ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_PRINT_PASS_RESULT |
0 |
No IR output after each graph or tensor IR pass |
ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_PRINT_PASS_RESULT |
1 |
Prints the output IR of each graph and tensor IR passes |
ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_VERBOSE |
0 1 2 |
No verbose output Prints warning messages during compilation Prints warning messages and info logs (e.g. fusion-related information) during compilation |
ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_DUMP_GENCODE |
path_to_dump |
Dumps the generated kernel in C |
ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_C_INCLUDE |
path_to_c_codegen_header |
Specifies the C codegen header for JIT compilation |
Enable Tracing¶
ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_KERNEL_TRACE=1 ./application
This will produce a kernel execution trace in JSON format that will be stored to the default destination: ./sctrace.json
.
ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_KERNEL_TRACE=1,stderr ./application
This will dump a kernel execution trace to the stderr stream.
ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_KERNEL_TRACE=1,/tmp/filename.json ./application
This will produce a kernel execution trace in JSON format that will be stored to the user specified path /tmp/filename.json
.
Switch Between Different Codegen Methods¶
By default, codegen methods have priorities ranked from higher to lower as llvm
, c
, builtin
. When multiple codegen and JIT methods are enabled at build stage, the method with the highest priority is adopted at runtime by default.
Users can switch to a different codegen method at runtime by setting ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_CPU_JIT
.
ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_CPU_JIT=builtin ./application
This will switch the CPU codegen and JIT method to builtin
(xbyak).
ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_CPU_JIT=c ./application
This will switch the CPU codegen and JIT method to c
.
When using C codegen option, the generated C code will rely on existing runtime function declarations in cpu_include.hpp
. ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_C_INCLUDE
environment variable is used to specify the corresponding include path. Normally, the include path is automatically set at CMake build stage. But if the following error message occurs environment variable ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_C_INCLUDE is not set
, users shall manually set ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_C_INCLUDE
to /path_to_onednn_repo/src/graph/backend/graph_compiler/core/src
.
Warning
The specified codegen method must be built. Otherwise, the default codegen method would be used.
Enable Code Dumping¶
Users can use ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_DUMP_GENCODE
variable to generate offline C kernels.
ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_DUMP_GENCODE="./dump_code" ./application
This will dump the generated C kernels to dump_code
folder.
Warning
ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_DUMP_GENCODE
works under both LLVM and C codegen.
Warning
The user specified ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_DUMP_GENCODE
path shall be an existing folder. Otherwise the code dumping will not be in effect.