Asynchronous Algorithms#
The functions defined in the STL <algorithm>
or <numeric>
headers are traditionally blocking. oneAPI DPC++ Library (oneDPL)
extends the functionality of the C++17 parallel algorithms by providing asynchronous algorithms with non-blocking behavior.
This experimental feature enables you to express a concurrent control flow by building dependency chains, interleaving algorithm calls,
and interoperability with SYCL* kernels.
The current implementation for async algorithms is limited to device execution policies.
All the functionality described below is available in the oneapi::dpl::experimental
namespace.
The following async algorithms are currently supported:
copy_async
fill_async
for_each_async
reduce_async
sort_async
inclusive_scan_async
exclusive_scan_async
transform_async
transform_reduce_async
transform_inclusive_scan_async
transform_exclusive_scan_async
All the interfaces listed above are a subset of the C++17 STL algorithms,
where the suffix _async
is added to the corresponding name (for example: reduce
, sort
, etc.).
The behavior and signatures are overlapping with the C++17 STL algorithm with the following changes:
They do not block the execution.
They take an arbitrary number of events (including 0) as last arguments to allow you to express input dependencies.
They return a future-like object that allows you to use
wait
for completion andget
for the result.
The type of the future-like object returned from an asynchronous algorithm is unspecified. The following member functions are present:
get()
returns the result.wait()
waits for the result to become available.
If the returned object is the result of an algorithm with a device policy, it can be converted into a sycl::event
.
The lifetime of any resources the algorithm allocates (for example: temporary storage) is bound to the lifetime of
the returned object.
The following utility functions are available:
wait_for_all(…)
waits for an arbitrary number of objects that are convertible intosycl::event
to become ready.
Example of Async API Usage#
#include <oneapi/dpl/execution>
#include <oneapi/dpl/async>
#include <sycl/sycl.hpp>
int main() {
/* Build and compute a simple dependency chain: Fill buffer -> Transform -> Reduce */
sycl::buffer<int> a{10};
auto fut1 = dpl::experimental::fill_async(dpl::execution::dpcpp_default,
dpl::begin(a),dpl::end(a),7);
auto fut2 = dpl::experimental::transform_async(dpl::execution::dpcpp_default,
dpl::begin(a),dpl::end(a),dpl::begin(a),
[&](const int& x){return x + 1; },fut1);
auto ret_val = dpl::experimental::reduce_async(dpl::execution::dpcpp_default,
dpl::begin(a),dpl::end(a),fut1,fut2).get();
return 0;
}