Getting started on CPU with Graph API¶
This is an example to demonstrate how to build a simple graph and run it on CPU.
This is an example to demonstrate how to build a simple graph and run it on CPU.
Example code: cpu_getting_started.cpp
Some key take-aways included in this example:
how to build a graph and get partitions from it
how to create an engine, allocator and stream
how to compile a partition
how to execute a compiled partition
Some assumptions in this example:
Only workflow is demonstrated without checking correctness
Unsupported partitions should be handled by users themselves
Public headers¶
To start using oneDNN Graph, we must include the dnnl_graph.hpp
header file in the application. All the C++ APIs reside in namespace dnnl::graph
.
#include <iostream> #include <memory> #include <vector> #include <unordered_map> #include <unordered_set> #include <assert.h> #include "oneapi/dnnl/dnnl_graph.hpp" #include "example_utils.hpp" #include "graph_example_utils.hpp" using namespace dnnl::graph; using data_type = logical_tensor::data_type; using layout_type = logical_tensor::layout_type; using dim = logical_tensor::dim; using dims = logical_tensor::dims;
cpu_getting_started_tutorial() function¶
Build Graph and Get Partitions¶
In this section, we are trying to build a graph containing the pattern conv0->relu0->conv1->relu1
. After that, we can get all of partitions which are determined by backend.
To build a graph, the connection relationship of different ops must be known. In oneDNN Graph, dnnl::graph::logical_tensor is used to express such relationship. So, next step is to create logical tensors for these ops including inputs and outputs.
Note
It’s not necessary to provide concrete shape/layout information at graph partitioning stage. Users can provide these information till compilation stage.
Create input/output dnnl::graph::logical_tensor for first Convolution
op.
logical_tensor conv0_src_desc {0, data_type::f32}; logical_tensor conv0_weight_desc {1, data_type::f32}; logical_tensor conv0_dst_desc {2, data_type::f32};
Create first Convolution
op (dnnl::graph::op) and attaches attributes to it, such as strides
, pads_begin
, pads_end
, data_format
, etc.
op conv0(0, op::kind::Convolution, {conv0_src_desc, conv0_weight_desc}, {conv0_dst_desc}, "conv0"); conv0.set_attr<dims>(op::attr::strides, {4, 4}); conv0.set_attr<dims>(op::attr::pads_begin, {0, 0}); conv0.set_attr<dims>(op::attr::pads_end, {0, 0}); conv0.set_attr<dims>(op::attr::dilations, {1, 1}); conv0.set_attr<int64_t>(op::attr::groups, 1); conv0.set_attr<std::string>(op::attr::data_format, "NCX"); conv0.set_attr<std::string>(op::attr::weights_format, "OIX");
Create input/output logical tensors for first BiasAdd
op and create the first BiasAdd
op
logical_tensor conv0_bias_desc {3, data_type::f32}; logical_tensor conv0_bias_add_dst_desc { 4, data_type::f32, layout_type::undef}; op conv0_bias_add(1, op::kind::BiasAdd, {conv0_dst_desc, conv0_bias_desc}, {conv0_bias_add_dst_desc}, "conv0_bias_add"); conv0_bias_add.set_attr<std::string>(op::attr::data_format, "NCX");
Create output logical tensors for first Relu
op and create the op.
logical_tensor relu0_dst_desc {5, data_type::f32}; op relu0(2, op::kind::ReLU, {conv0_bias_add_dst_desc}, {relu0_dst_desc}, "relu0");
Create input/output logical tensors for second Convolution
op and create the second Convolution
op.
logical_tensor conv1_weight_desc {6, data_type::f32}; logical_tensor conv1_dst_desc {7, data_type::f32}; op conv1(3, op::kind::Convolution, {relu0_dst_desc, conv1_weight_desc}, {conv1_dst_desc}, "conv1"); conv1.set_attr<dims>(op::attr::strides, {1, 1}); conv1.set_attr<dims>(op::attr::pads_begin, {0, 0}); conv1.set_attr<dims>(op::attr::pads_end, {0, 0}); conv1.set_attr<dims>(op::attr::dilations, {1, 1}); conv1.set_attr<int64_t>(op::attr::groups, 1); conv1.set_attr<std::string>(op::attr::data_format, "NCX"); conv1.set_attr<std::string>(op::attr::weights_format, "OIX");
Create input/output logical tensors for second BiasAdd
op and create the op.
logical_tensor conv1_bias_desc {8, data_type::f32}; logical_tensor conv1_bias_add_dst_desc {9, data_type::f32}; op conv1_bias_add(4, op::kind::BiasAdd, {conv1_dst_desc, conv1_bias_desc}, {conv1_bias_add_dst_desc}, "conv1_bias_add"); conv1_bias_add.set_attr<std::string>(op::attr::data_format, "NCX");
Create output logical tensors for second Relu
op and create the op
logical_tensor relu1_dst_desc {10, data_type::f32}; op relu1(5, op::kind::ReLU, {conv1_bias_add_dst_desc}, {relu1_dst_desc}, "relu1");
Finally, those created ops will be added into the graph. The graph inside will maintain a list to store all these ops. To create a graph, dnnl::engine::kind is needed because the returned partitions maybe vary on different devices. For this example, we use CPU engine.
Note
The order of adding op doesn’t matter. The connection will be obtained through logical tensors.
Create graph and add ops to the graph
graph g(dnnl::engine::kind::cpu); g.add_op(conv0); g.add_op(conv0_bias_add); g.add_op(relu0); g.add_op(conv1); g.add_op(conv1_bias_add); g.add_op(relu1);
After adding all ops into the graph, call dnnl::graph::graph::get_partitions() to indicate that the graph building is over and is ready for partitioning. Adding new ops into a finalized graph or partitioning a unfinalized graph will both lead to a failure.
g.finalize();
After finished above operations, we can get partitions by calling dnnl::graph::graph::get_partitions().
In this example, the graph will be partitioned into two partitions:
conv0 + conv0_bias_add + relu0
conv1 + conv1_bias_add + relu1
auto partitions = g.get_partitions();
Compile and Execute Partition¶
In the real case, users like framework should provide device information at this stage. But in this example, we just use a self-defined device to simulate the real behavior.
Create a dnnl::engine. Also, set a user-defined dnnl::graph::allocator to this engine.
allocator alloc {}; dnnl::engine eng = make_engine_with_allocator(dnnl::engine::kind::cpu, 0, alloc);
Create a dnnl::stream on a given engine
dnnl::stream strm {eng};
Compile the partition to generate compiled partition with the input and output logical tensors.
compiled_partition cp = partition.compile(inputs, outputs, eng);
Execute the compiled partition on the specified stream.
cp.execute(strm, inputs_ts, outputs_ts);