oneAPI DPC++ Library Introduction#
The oneAPI DPC++ Library (oneDPL) is implemented in accordance with the oneDPL Specification.
To support heterogeneity, oneDPL uses SYCL. More information about SYCL can be found in the SYCL Specification.
Before You Begin#
Visit the oneDPL Release Notes page for:
Where to Find the Release
Overview
New Features
Fixed Issues
Deprecation Notice
Known Issues and Limitations
Previous Release Notes
Install the Intel® oneAPI Base Toolkit (Base Kit) to use oneDPL.
System Requirements#
Prerequisites#
C++17 is the minimal supported version of the C++ standard. That means, any use of oneDPL may require a C++17 compiler. While some APIs of the library may accidentally work with earlier versions of the C++ standard, it is no more guaranteed.
To call Parallel API with the C++ standard aligned policies, you need to install the following software:
A C++ compiler with support for OpenMP* 4.0 (or higher) SIMD constructs
Depending on what parallel backend you want to use, install either:
oneAPI Threading Building Blocks (oneTBB) or Intel® Threading Building Blocks (Intel® TBB) 2019 and later,
A C++ compiler with support for OpenMP 4.5 (or higher).
For more information about parallel backends, see Execution Policies.
To use Parallel API with the device execution policies, you need to install the following software:
A C++ compiler with support for SYCL 2020.
Develop and Build Your Code with oneDPL#
All oneDPL header files are in the oneapi/dpl
directory. To use the oneDPL API,
include the corresponding header in your source code with the #include <oneapi/dpl/…>
directive.
For better coexistence with the C++ standard library, include oneDPL header files before the standard C++ ones.
oneDPL introduces the namespace oneapi::dpl
for its classes and functions. For brevity,
namespace dpl
is defined as an alias to oneapi::dpl
and can be used interchangeably.
To use tested C++ standard APIs in SYCL device code,
include the corresponding C++ standard header files and use the std
namespace.
Follow the steps below to build your code with oneDPL:
To build with the Intel® oneAPI DPC++/C++ Compiler, see the Get Started with the Intel® oneAPI DPC++/C++ Compiler for details.
Set the environment variables for oneDPL and oneTBB.
Here is an example of a command line used to compile code that contains oneDPL parallel algorithms on Linux* (depending on the code, parameters within [] could be unnecessary):
icpx [-fsycl] [-fiopenmp] program.cpp [-ltbb] -o program
You may also use the -fsycl-pstl-offload
option of Intel® oneAPI DPC++/C++ Compiler powered by oneDPL
to build the standard C++ code for execution on a SYCL device:
icpx -fsycl -fsycl-pstl-offload=gpu program.cpp -o program
This option redirects C++ parallel algorithms invoked with the std::execution::par_unseq
policy
to oneDPL algorithms. It does not change the behavior of the oneDPL algorithms and
execution policies that are directly used in the code.
Useful Information#
Difference with Standard C++ Parallel Algorithms#
oneDPL execution policies only result in parallel execution if random access iterators are provided, the execution will remain serial for other iterator types.
Function objects passed in to algorithms executed with device policies must provide
const
-qualifiedoperator()
. The SYCL specification states that writing to such an object during a SYCL kernel is undefined behavior.For the following algorithms,
par_unseq
andunseq
policies do not result in SIMD execution:includes
,inplace_merge
,merge
,set_difference
,set_intersection
,set_symmetric_difference
,set_union
,stable_partition
,unique
.The following algorithms require additional O(n) memory space for parallel execution:
copy_if
,inplace_merge
,partial_sort
,partial_sort_copy
,partition_copy
,remove
,remove_if
,rotate
,sort
,stable_sort
,unique
,unique_copy
.
Restrictions#
When called with device execution policies, oneDPL algorithms apply the same restrictions as DPC++ does (see the Intel® oneAPI DPC++/C++ Compiler documentation and the SYCL specification for details), such as:
Adding buffers to a lambda capture list is not allowed for lambdas passed to an algorithm.
Passing data types, which are not trivially copyable, is only allowed via USM, but not via buffers or host-allocated containers.
Objects of pointer-to-member types cannot be passed to an algorithm.
The definition of lambda functions used with parallel algorithms should not depend on preprocessor macros that makes it different for the host and the device. Otherwise, the behavior is undefined.
When used within SYCL kernels or transferred to/from a device, a container class can only hold objects whose type meets SYCL requirements for use in kernels and for data transfer, respectively.
Calling the API that throws exception is not allowed within callable objects passed to an algorithm.
Known Limitations#
The
oneapi::dpl::execution::par_unseq
policy is affected by-fsycl-pstl-offload
option of Intel® oneAPI DPC++/C++ Compiler when oneDPL substitutes this policy for thestd::execution::par_unseq
policy missing in a standard C++ library, particularly in libstdc++ version 8 and in libc++.For
transform_exclusive_scan
andexclusive_scan
to run in-place (that is, with the same data used for both input and destination) and with an execution policy ofunseq
orpar_unseq
, it is required that the provided input and destination iterators are equality comparable. Furthermore, the equality comparison of the input and destination iterator must evaluate to true. If these conditions are not met, the result of these algorithm calls is undefined.For
transform_exclusive_scan
,transform_inclusive_scan
algorithms the result of the unary operation should be convertible to the type of the initial value if one is provided, otherwise it is convertible to the type of values in the processed data sequence:std::iterator_traits<IteratorType>::value_type
.exclusive_scan
andtransform_exclusive_scan
algorithms may provide wrong results with unsequenced execution policies when building a program with GCC 10 and using-O0
option.Compiling
reduce
andtransform_reduce
algorithms with Intel® oneAPI DPC++/C++ Compiler versions 2021 and older may result in a runtime error. To fix this issue, use Intel® oneAPI DPC++/C++ Compiler version 2022 or newer.When compiling on Windows, add the option
/EHsc
to the compilation command to avoid errors with oneDPL’s experimental ranges API that uses exceptions.The
using namespace oneapi;
directive in a oneDPL program code may result in compilation errors with some compilers including GCC 7 and earlier. Instead of this directive, explicitly use theoneapi::dpl
namespace, the shorterdpl
namespace alias, or create your own alias.std::array::at
member function cannot be used in kernels because it may throw an exception; usestd::array::operator[]
instead.Due to specifics of Microsoft* Visual C++, some standard floating-point math functions (including
std::ldexp
,std::frexp
,std::sqrt(std::complex<float>)
) require device support for double precision.exclusive_scan
,inclusive_scan
,exclusive_scan_by_segment
,inclusive_scan_by_segment
,transform_exclusive_scan
,transform_inclusive_scan
, when used with C++ standard aligned policies, impose limitations on the initial value type if an initial value is provided, and on the value type of the input iterator if an initial value is not provided. Firstly, it must satisfy theDefaultConstructible
requirements. Secondly, a default-constructed instance of that type should act as the identity element for the binary scan function.reduce_by_segment
, when used with C++ standard aligned policies, imposes limitations on the value type. Firstly, it must satisfy theDefaultConstructible
requirements. Secondly, a default-constructed instance of that type should act as the identity element for the binary reduction function.The initial value type for
exclusive_scan
,inclusive_scan
,exclusive_scan_by_segment
,inclusive_scan_by_segment
,reduce
,reduce_by_segment
,transform_reduce
,transform_exclusive_scan
,transform_inclusive_scan
should satisfy theMoveAssignable
and theCopyConstructible
requirements.For
max_element
,min_element
,minmax_element
,partial_sort
,partial_sort_copy
,sort
,stable_sort
the dereferenced value type of the provided iterators should satisfy theDefaultConstructible
requirements.For
remove
,remove_if
,unique
the dereferenced value type of the provided iterators should beMoveConstructible
.