DPC++#

Overview#

Data Parallel C++ (DPC++) is an LLVM project to implement SYCL that provides direct programming capabilities for C++ programmers, and oneAPI libraries such as oneMKL. It provides the features needed to define data parallel functions and to launch them on devices. DPC++ is made up of the following components:

  • C++. Every DPC++ program is also a C++ program. A compliant DPC++ implementation must support the C++17 Core Language (as specified in Sections 1-19 of ISO/IEC 14882:2017) or newer. See the C++ Standard.

  • SYCL. DPC++ builds on the SYCL specification from The Khronos Group. The SYCL language enables the definition of data parallel functions that can be offloaded to devices and defines runtime APIs and classes that are used to orchestrate the offloaded functions.

  • DPC++ Language extensions. A compliant DPC++ implementation must support the specified language features. Some extensions are required only when the DPC++ implementation supports a specific class of device, as summarized in the Extensions Table. An implementation supports a class of device if it can target hardware that responds “true” for a DPC++ device type query, either through explicit support built into the implementation, or by using a lower layer that can support those device classes such as the oneAPI Level Zero (Level Zero). A DPC++ implementation must pass the conformance tests for all extensions that are required (Extensions Table) for the classes of devices that the implementation can support. (See SYCL Extensions.)

This specification requires a minimum of C++17 Core Language support and DPC++ extensions. These version and feature coverage requirements will evolve over time, with specific versions of C++ and SYCL being required, some additional extensions being required, and some DPC++ extensions no longer required if covered by newer C++ or SYCL versions directly.

DPC++ Extensions Table: Support requirements for DPC++ implementations above SYCL 2020#

Feature

CPU

GPU

FPGA

Test 1

Accessor properties

Required

Required

Required

NA 2

CXX standard library

Required

Required

Not required 3

NA 2

Data flow pipes

Not required

Not required

Required

fpga_tests

Enqueued barriers

Required

Required

Required

NA 2

Extended atomics

Required

Required

Required

NA 2

Filter selector

Required

Required

Required

NA 2

FPGA LSU controls

Not required

Not required

Required

NA 2

FPGA memory channel

Not required

Not required

Required

NA 2

FPGA register

Not required

Not required

Required

NA 2

FPGA selector

Required

Required

Required

NA 2

GPU device info

Required

Required

Required

NA 2

Level zero backend

Required 4

Required 4

Required 4

NA 2

Local memory allocations

Required

Required

Required

NA 2

Pinned memory property

Required

Required

Required

NA 2

Platform context

Required

Required

Required

NA 2

Restrict all arguments

Required

Required

Required

NA 2

Sub-group mask

Required

Required

Required

NA 2

1

Test directory within extension tests

2(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16)

Not yet available.

3

Likely to be required in the future

4(1,2,3)

Required if the device backend is Level Zero.

Detailed API and Language Descriptions#

The SYCL 2020 Specification describes the SYCL APIs and language. DPC++ extensions on top of SYCL are described in the SYCL Extensions repository.

A brief summary of the extensions is as follows:

  • Accessor properties - compile-time accessor properties that are visible to the compiler.

  • CXX standard library - enable subset of the C and C++ standard libraries in device code.

  • Data flow pipes - enable efficient First-In, First-Out (FIFO) communication in DPC++, a mechanism commonly used when describing algorithms for spatial architectures such as FPGAs.

  • Enqueued barriers - simplifies dependence creation and tracking for some common programming patterns by allowing coarser grained synchronization within a queue without manual creation of fine grained dependencies.

  • Extended atomics - adds atomic_accessor on top of SYCL 2020 atomics.

  • Filter selector - adds a device selector which consumes a string of filter definitions, and that can be used to easily restrict the set of devices which are passed to the usual device selection mechanisms.

  • FPGA LSU controls - tuning controls for FPGA load/store operations.

  • FPGA memory channel - placement controls for data with external memory banks (e.g. DDR channel) for tuning FPGA designs.

  • FPGA register - tuning control for FPGA high performance pipelining.

  • FPGA selector - adds a set of device selectors that make it easy to acquire an FPGA hardware or emulation device.

  • GPU device info - adds GPU-specific queries around SIMD width, memory bandwidth, unique identifiers, and topology of the compute structures.

  • Level zero backend - defines interoperability with Level Zero as a backend to SYCL.

  • Local memory allocations - adds ability for local memory allocations to be declared within a kernel, as opposed to through an accessor that is passed to a kernel. Makes kernels more self contained and easier to read and optimize.

  • Pinned memory property - optimization indicating that a buffer should use a specific memory resource if possible, to accelerate movement of data between host and devices in some implementations.

  • Platform context - adds a default context per SYCL platform, which simplifies and improves performance in common coding patterns.

  • Restrict all arguments - defines an attribute that can be applied to kernels (including lambda definitions of kernels) which signals that there will be no memory aliasing between any pointer arguments that are passed to or captured by a kernel. This is an optimization attribute that can have large impact when the developer knows more about the kernel arguments than a compiler can infer or safely assume.

  • Sub-group mask - adds a new opaque type and operations on it, which can be used to represent and manage sets of work-items within a sub-group.

Open Source Implementation#

An open source implementation is available under an LLVM license. Details on incomplete features and known issues are available in the Release Notes (and the Getting Started Guide until the release notes are available).

Testing#

A DPC++ implementation must pass the extension tests for any extension implemented from the Extensions Table. Each extension in the Extensions Table lists the name of the directory that contains corresponding tests, within the extension tests tree.

Acknowledgment#

We thank the DPC++ and oneDPL Technical Advisory Board for their valuable feedback, and the Khronos SYCL working group for their efforts defining and evolving the SYCL specification.