Execution Policies#
According to the oneAPI specification, oneAPI DPC++ Library (oneDPL) provides execution policies semantically aligned with the C++ standard, referred to as standard-aligned or host execution policies, as well as device execution policies to run data parallel computations on heterogeneous systems.
The execution policies are defined in the oneapi::dpl::execution
namespace and provided
in the <oneapi/dpl/execution>
header. The policies have the following meaning:
Policy Name / Type |
Description |
---|---|
|
The standard-aligned policy for sequential execution. |
|
The standard-aligned policy for possible unsequenced SIMD execution. This policy requires user-provided functions to be SIMD-safe. |
|
The standard-aligned policy for possible parallel execution by multiple threads. |
|
The standard-aligned policy with the combined effect of |
|
The class template to create device policies for data parallel execution. |
|
The device policy for data parallel execution on the default SYCL device. |
|
The class template to create policies for execution on FPGA devices. |
|
The device policy for data parallel execution on a SYCL FPGA device. |
The implementation is based on Parallel STL from the LLVM Project.
oneDPL supports two parallel backends for execution with par
and par_unseq
policies:
The TBB backend (enabled by default) uses oneAPI Threading Building Blocks (oneTBB) or Intel® Threading Building Blocks (Intel® TBB) for parallel execution.
The OpenMP backend uses OpenMP* pragmas for parallel execution. Visit Macros for the information how to enable the OpenMP backend.
OpenMP pragmas are also used for SIMD execution with unseq
and par_unseq
policies.
Follow these steps to add Parallel API to your application:
Add
#include <oneapi/dpl/execution>
to your code. Then include one or more of the following header files, depending on the algorithms you intend to use:#include <oneapi/dpl/algorithm>
#include <oneapi/dpl/numeric>
#include <oneapi/dpl/memory>
Pass a oneDPL execution policy object as the first argument to a parallel algorithm to indicate the desired execution behavior.
If you use the standard-aligned execution policies:
Compile the code with options that enable OpenMP parallelism and/or SIMD vectorization pragmas.
Compile and link with the oneTBB or Intel® TBB library for TBB-based parallelism.
If you use the device execution policies, compile the code with options that enable support for SYCL 2020.
Use the C++ Standard Aligned Execution Policies#
Example:
#include <oneapi/dpl/execution>
#include <oneapi/dpl/algorithm>
#include <vector>
int main()
{
std::vector<int> data( 1000 );
std::fill(oneapi::dpl::execution::par_unseq, data.begin(), data.end(), 42);
return 0;
}
Use the Device Execution Policies#
The device execution policy specifies where a oneDPL parallel algorithm runs. It encapsulates a SYCL device or queue and allows you to set an optional kernel name.
To create a policy object, you may use one of the following constructor arguments:
A SYCL queue
A SYCL device
A SYCL device selector
An existing policy object with a different kernel name
A kernel name is set with a policy template argument.
Providing a kernel name for a policy is optional, if your compiler supports implicit
names for SYCL kernel functions. The Intel® oneAPI DPC++/C++ Compiler supports it by default;
for other compilers it may need to be enabled with compilation options such as
-fsycl-unnamed-lambda
. Refer to your compiler documentation for more information.
The oneapi::dpl::execution::dpcpp_default
object is a predefined immutable object of
the device_policy
class. It is created with a default kernel name and uses a default queue.
Use it to construct customized policy objects or pass directly when invoking an algorithm.
If dpcpp_default
is passed directly to more than one algorithm, you must ensure that the
compiler you use supports implicit kernel names (see above) and this option is turned on.
The make_device_policy
function templates simplify device_policy
creation.
Usage Examples#
The code examples below assume you are using namespace oneapi::dpl::execution;
and using namespace sycl;
directives when referring to policy classes and functions:
auto policy_a = device_policy<class PolicyA> {};
std::for_each(policy_a, ...);
auto policy_b = device_policy<class PolicyB> {device{gpu_selector_v}};
std::for_each(policy_b, ...);
auto policy_c = device_policy<class PolicyC> {device{cpu_selector_v}};
std::for_each(policy_c, ...);
auto policy_d = make_device_policy<class PolicyD>(dpcpp_default);
std::for_each(policy_d, ...);
auto policy_e = make_device_policy(queue{property::queue::in_order()});
std::for_each(policy_e, ...);
Use the FPGA Policy#
The fpga_policy
class is a device policy tailored to achieve
better performance of parallel algorithms on FPGA hardware devices.
Use the policy when you run the application on a FPGA hardware device or FPGA emulation device with the following steps:
Define the
ONEDPL_FPGA_DEVICE
macro to run on FPGA devices and theONEDPL_FPGA_EMULATOR
to run on FPGA emulation devices.Add
#include <oneapi/dpl/execution>
to your code.Create a policy object by providing an unroll factor (see the Note below), a class type for a unique kernel name as template arguments (both optional), and one of the following constructor arguments:
A SYCL queue constructed for the FPGA Selector (the behavior is undefined with any other queue).
An existing FPGA policy object with a different kernel name and/or unroll factor.
Pass the created policy object to a parallel algorithm.
The default constructor of fpga_policy
wraps a SYCL queue created
for fpga_selector
, or for fpga_emulator_selector
if the ONEDPL_FPGA_EMULATOR
is defined.
oneapi::dpl::execution::dpcpp_fpga
is a predefined immutable object of
the fpga_policy
class created with a default unroll factor and a default kernel name.
Use it to create customized policy objects or pass directly when invoking an algorithm.
Note
Specifying the unroll factor for a policy enables loop unrolling in the implementation of oneDPL algorithms. The default value is 1. To find out how to choose a more precise value, refer to the unroll Pragma and Loop Analysis content in the Intel® oneAPI FPGA Handbook.
The make_fpga_policy
function templates simplify fpga_policy
creation.
FPGA Policy Usage Examples#
The code below assumes you have added using namespace oneapi::dpl::execution;
for policies and
using namespace sycl;
for queues and device selectors:
constexpr auto unroll_factor = 8;
auto fpga_policy_a = fpga_policy<unroll_factor, class FPGAPolicyA>{};
auto fpga_policy_b = make_fpga_policy(queue{intel::fpga_selector{}});
auto fpga_policy_c = make_fpga_policy<unroll_factor, class FPGAPolicyC>();
Error Handling with Device Execution Policies#
The SYCL error handling model supports two types of errors. Synchronous errors cause the SYCL API functions to throw exceptions. Asynchronous errors may only be processed in a user-supplied error handler associated with a SYCL queue.
For algorithms executed with device policies, handling all errors, synchronous or asynchronous, is a responsibility of the caller. Specifically:
No exceptions are thrown explicitly by algorithms.
Exceptions thrown by runtime libraries at the host CPU, including SYCL synchronous exceptions, are passed through to the caller.
SYCL asynchronous errors are not handled.
To process SYCL asynchronous errors, the queue associated with a device policy must be
created with an error handler object. The predefined policy objects (dpcpp_default
, etc.) have
no error handlers; do not use them if you need to process asynchronous errors.