This C++ API example demonstrates how to build an AlexNet neural network topology for forward-pass inference. Some key take-aways include:
- How tensors implemented and submitted to primitives.
- How primitives are created.
- How primitives are sequentially submitted to the network, where the output from primitives is passed as input to the next primitive. The later specifies dependency between primitive input <-> output data.
- Specific 'inference-only' configurations.
- Limit the number of reorders performed which are decremental to performance.
The simple_net.cpp example implements the AlexNet layers as numbered primitives (e.g. conv1, pool1, conv2).
Highlights for implementing the simple_net.cpp Example:
- Initialize a CPU engine. The last parameter in the engine() call represents the index of the engine.
- Create a primitives vector that represents the net.
std::vector<primitive> net;
- Additionally, create a separate vector holding the weights. This will allow executing transformations only once and outside the topology stream.
std::vector<primitive> net_weights;
- Allocate a vector for input data and create the tensor to configure the dimensions.
memory::dims conv1_src_tz = { batch, 3, 227, 227 };
std::vector<float> user_src(batch * 3 * 227 * 227);
- Create a memory primitive for data in user format as
nchw
(minibatch-channels-height-width). Create a memory descriptor for the convolution input, selecting any
for the data format. The any
format allows the convolution primitive to choose the data format that is most suitable for its input parameters (convolution kernel sizes, strides, padding, and so on). If the resulting format is different from nchw
, the user data must be transformed to the format required for the convolution (as explained below). auto user_src_memory = memory({ { { conv1_src_tz }, memory::data_type::f32,
memory::format::nchw }, cpu_engine}, user_src.data());
auto conv1_src_md = memory::desc({conv1_src_tz},
memory::data_type::f32, memory::format::any);
- Create a convolution descriptor by specifying the algorithm(convolution algorithms, propagation kind, shapes of input, weights, bias, output, convolution strides, padding, and kind of padding. Propagation kind is set to forward_inference -optimized for inference execution and omits computations that are only necessary for backward propagation. */
auto conv1_desc = convolution_forward::desc(
conv1_src_md, conv1_weights_md, conv1_bias_md, conv1_dst_md,
- Create a descriptor of the convolution primitive. Once created, this descriptor has specific formats instead of the
any
format specified in the convolution descriptor. auto conv1_prim_desc = convolution_forward::primitive_desc(conv1_desc, cpu_engine);
- Create a convolution memory primitive from the user memory and check whether the user data format differs from the format that the convolution requires. In case it is different, create a reorder primitive that transforms the user data to the convolution format and add it to the net. Repeat this process for weights as well.
auto conv1_src_memory = user_src_memory;
if (memory::primitive_desc(conv1_prim_desc.src_primitive_desc())
!= user_src_memory.get_primitive_desc()) {
conv1_src_memory = memory(conv1_prim_desc.src_primitive_desc());
net.push_back(reorder(user_src_memory, conv1_src_memory));
}
- Create a memory primitive for output.
auto conv1_dst_memory = memory(conv1_prim_desc.dst_primitive_desc());
- Create a convolution primitive and add it to the net.
net.push_bash(convolution_forward(conv1_prim_desc, conv1_src_memory, conv1_weights_memory,
user_bias_memory, conv1_dst_memory));
- Create relu primitive. For better performance keep ReLU (as well as for other operation primitives until another convolution or inner product is encountered) input data format in the same format as was chosen by convolution. Furthermore, ReLU is done in-place by using conv1 memory.
auto relu1_prim_desc = eltwise_forward::primitive_desc(relu1_desc, cpu_engine);
net.push_back(eltwise_forward(relu1_prim_desc, conv1_dst_memory, conv1_dst_memory));
- For training execution, pooling requires a private workspace memory to perform the backward pass. However, pooling should not use 'workspace' for inference as this is decremental to performance.
auto pool1_dst_memory = memory(pool1_pd.dst_primitive_desc());
net.push_back(pooling_forward(pool1_pd, lrn1_dst_memory, pool1_dst_memory
));
The example continues to create more layers according to the AlexNet topology.
- Finally, create a stream to execute weights data transformation. This is only required once. Create another stream that will exeute the 'net' primitives. For this example, the net is executed multiple times and each execution es timed individually.
stream(stream::kind::eager).submit(net_weights).wait();
Legal information