oneAPI Deep Neural Network Library (oneDNN) is a performance library containing building blocks for deep learning applications and frameworks. oneDNN supports:

  • CNN primitives (Convolutions, Inner product, Pooling, etc.)

  • RNN primitives (LSTM, Vanilla, RNN, GRU)

  • Normalizations (LRN, Batch, Layer)

  • Elementwise operations (ReLU, Tanh, ELU, Abs, etc.)

  • Softmax, Sum, Concat, Shuffle

  • Reorders from/to optimized data layouts

  • 8-bit integer, 16-, 32-bit, and bfloat16 floating point data types

// Tensor dimensions
int N, C, H, W;

// User-owned DPC++ objects
sycl::device dev {sycl::gpu_selector {}}; // Device
sycl::context ctx {dev}; // Context
sycl::queue queue {dev}; // Queue
std::vector<sycl::event> dependencies; // Input events dependencies
// Source
float *buf_src = static_cast<float *>(
        sycl::malloc_device((N * C * H * W) * sizeof(float), dev, ctx));
// Results
float *buf_dst = static_cast<float *>(
        sycl::malloc_device((N * C * H * W) * sizeof(float), dev, ctx));

// Create an engine encapsulating users' DPC++ GPU device and context
dnnl::engine engine = dnnl::sycl_interop::make_engine(dev, ctx);
// Create a stream encapsulating users' DPC++ GPU queue
dnnl::stream stream = dnnl::sycl_interop::make_stream(engine, queue);
// Create memory objects that use buf_src and buf_dst as the underlying storage
dnnl::memory mem_src({{N, C, H, W}, dnnl::memory::data_type::f32,
        engine, buf_src);
dnnl::memory mem_dst({{N, C, H, W}, dnnl::memory::data_type::f32,
        engine, buf_dst);
// Create a ReLU elementwise primitive
dnnl::eltwise_forward relu {
        {{dnnl::prop_kind::forward_inference, dnnl::algorithm::eltwise_relu,
                 mem_src.get_desc(), 0.f, 0.f},
// Execute the ReLU primitive in the stream passing input dependencies and
// retrieving the output dependency
sycl::event event = dnnl::sycl_interop::execute(relu, stream,
        {{DNNL_ARG_SRC, mem_src}, {DNNL_ARG_DST, mem_dst}}, dependencies);

Open Source Implementation#

Intel has published an open source implementation with the Apache license.

Implementation Notes#

This specification provides high-level descriptions for oneDNN operations and does not cover all the implementation-specific details of the open source implementation. Specifically, it does not cover highly-optimized memory formats and integration with profiling tools, etc. This is done intentionally to improve specification portability. Code that uses API defined in this specification is expected to be portable across open source implementation and any potential other implementations of this specification to a reasonable extent.

In the future this section will be extended with more details on how different implementations of this specification should cooperate and co-exist.


Intel’s binary distribution of oneDNN contains example code that you can be used to test library functionality.

The open source implementation includes a comprehensive test suite. Consult the README for directions.