Pass Data to Algorithms#

For an algorithm to access data, it is important that the used execution policy matches the data storage type. The following table shows which execution policies can be used with various data storage types.

Data Storage

Device Policies

Host Policies

SYCL buffer

Yes

No

Device-allocated unified shared memory (USM)

Yes

No

Shared and host-allocated USM

Yes

Yes

std::vector with sycl::usm_allocator

Yes

Yes

std::vector with an ordinary allocator

See Use std::vector

Yes

Other data in host memory

No

Yes

When using the standard-aligned (or host) execution policies, oneDPL supports data being passed to its algorithms as specified in the C++ standard (C++17 for algorithms working with iterators, C++20 for parallel range algorithms), with known restrictions and limitations.

According to the standard, the calling code must prevent data races when using algorithms with parallel execution policies.

Note

Implementations of std::vector<bool> are not required to avoid data races for concurrent modifications of vector elements. Some implementations may optimize multiple bool elements into a bitfield, making it unsafe for multithreading. For this reason, it is recommended to avoid std::vector<bool> for anything but a read-only input with the standard-aligned execution policies.

The following subsections describe proper ways to pass data to an algorithm invoked with a device execution policy.

Use oneapi::dpl::begin and oneapi::dpl::end Functions#

oneapi::dpl::begin and oneapi::dpl::end are special helper functions that allow you to pass SYCL buffers to parallel algorithms. These functions accept a SYCL buffer and return an object of an unspecified type that provides the following API:

  • It satisfies CopyConstructible and CopyAssignable C++ named requirements and comparable with operator== and operator!=.

  • It gives the following valid expressions: a + n, a - n, and a - b, where a and b are objects of the type, and n is an integer value. The effect of those operations is the same as for the type that satisfies the LegacyRandomAccessIterator, a C++ named requirement.

  • It provides the get_buffer method, which returns the buffer passed to the begin and end functions.

The begin and end functions can take SYCL 2020 deduction tags and sycl::no_init as arguments to explicitly control which access mode should be applied to a particular buffer when submitting a SYCL kernel to a device:

sycl::buffer<int> buf{/*...*/};
auto first_ro = oneapi::dpl::begin(buf, sycl::read_only);
auto first_wo = oneapi::dpl::begin(buf, sycl::write_only, sycl::no_init);
auto first_ni = oneapi::dpl::begin(buf, sycl::no_init);

To use the functions, add #include <oneapi/dpl/iterator> to your code. For example:

#include <oneapi/dpl/execution>
#include <oneapi/dpl/algorithm>
#include <oneapi/dpl/iterator>
#include <random>
#include <sycl/sycl.hpp>

int main(){
  std::vector<int> vec(1000);
  std::generate(vec.begin(), vec.end(), std::minstd_rand{});

  sycl::buffer<int> buf{ vec.data(), vec.size() };
  auto buf_begin = oneapi::dpl::begin(buf);
  auto buf_end   = oneapi::dpl::end(buf);

  oneapi::dpl::sort(oneapi::dpl::execution::dpcpp_default, buf_begin, buf_end);
  return 0;
}

Use Unified Shared Memory#

If you have USM-allocated data, pass the pointers to the start and past the end of the data sequence to a parallel algorithm. Make sure that the execution policy and the USM allocation use the same SYCL queue. For example:

#include <oneapi/dpl/execution>
#include <oneapi/dpl/algorithm>
#include <random>
#include <sycl/sycl.hpp>

int main(){
  sycl::queue q;
  const int n = 1000;
  int* d_head = sycl::malloc_shared<int>(n, q);
  std::generate(d_head, d_head + n, std::minstd_rand{});

  oneapi::dpl::sort(oneapi::dpl::execution::make_device_policy(q), d_head, d_head + n);

  sycl::free(d_head, q);
  return 0;
}

Note

Use of non-USM pointers is not supported for algorithms with device execution policies.

When using device USM, such as allocated by malloc_device, you are responsible for data transfers to and from the device to ensure that input data is device accessible during oneDPL algorithm execution and that the result is available to the subsequent operations.

Use std::vector#

You can use iterators to an ordinary std::vector with data in host memory, as shown in the following example:

#include <oneapi/dpl/execution>
#include <oneapi/dpl/algorithm>
#include <random>
#include <vector>

int main(){
  std::vector<int> vec( 1000 );
  std::generate(vec.begin(), vec.end(), std::minstd_rand{});

  oneapi::dpl::sort(oneapi::dpl::execution::dpcpp_default, vec.begin(), vec.end());
  return 0;
}

In this case a temporary SYCL buffer is created, the data is copied to this buffer, and it is processed according to the algorithm semantics. After processing on a device is complete, the modified data is copied from the temporary buffer back to the host container.

Note

For parallel range algorithms with device execution policies the use of ordinary std::vectors is not supported.

While convenient, direct use of an ordinary std::vector can lead to unintended copying between the host and the device. We recommend working with SYCL buffers or with USM to reduce data copying.

Note

For specialized memory algorithms that begin or end the lifetime of data objects, that is, uninitialized_* and destroy* families of functions, the data to initialize or destroy should be accessible on the device without extra copying. Therefore these algorithms may not use data storage on the host with device execution policies.

You can also use std::vector with a sycl::usm_allocator, as shown in the following example. Make sure that the allocator and the execution policy use the same SYCL queue:

#include <oneapi/dpl/execution>
#include <oneapi/dpl/algorithm>
#include <random>
#include <vector>
#include <sycl/sycl.hpp>

int main(){
  const int n = 1000;
  auto policy = oneapi::dpl::execution::dpcpp_default;
  sycl::usm_allocator<int, sycl::usm::alloc::shared> alloc(policy.queue());
  std::vector<int, decltype(alloc)> vec(n, alloc);
  std::generate(vec.begin(), vec.end(), std::minstd_rand{});

  // Recommended to use USM pointers:
  oneapi::dpl::sort(policy, vec.data(), vec.data() + vec.size());
/*
  // Iterators for USM allocators might require extra copying - not a recommended method
  oneapi::dpl::sort(policy, vec.begin(), vec.end());
*/
  return 0;
}

For std::vector with a USM allocator we recommend to use std::vector::data() in combination with std::vector::size() as shown in the example above, rather than iterators to std::vector. That is because for some implementations of the C++ Standard Library it might not be possible for oneDPL to detect that iterators are pointing to USM-allocated data. In that case the data will be treated as if it were in host memory, with an extra copy made to a SYCL buffer. Retrieving USM pointers from std::vector as shown guarantees no unintended copying.

Use Range Views#

For parallel range algorithms with device execution policies, place the data in USM or a USM-allocated std::vector, and pass it to an algorithm via a device-copyable range or view object such as std::ranges::subrange or std::span.

Note

Use of std::ranges::views::all is not supported for algorithms with device execution policies.

These data ranges as well as supported range adaptors and factories may be combined into data transformation pipelines that also can be used with parallel range algorithms. For example:

#include <oneapi/dpl/execution>
#include <oneapi/dpl/algorithm>
#include <random>
#include <vector>
#include <span>
#include <ranges>
#include <functional>
#include <sycl/sycl.hpp>

int main(){
  const int n = 1000;
  auto policy = oneapi::dpl::execution::dpcpp_default;
  sycl::queue q = policy.queue();

  int* d_head = sycl::malloc_host<int>(n, q);
  std::generate(d_head, d_head + n, std::minstd_rand{});

  sycl::usm_allocator<int, sycl::usm::alloc::shared> alloc(q);
  std::vector<int, decltype(alloc)> vec(n, alloc);

  oneapi::dpl::ranges::copy(policy,
      std::ranges::subrange(d_head, d_head + n) | std::views::transform(std::negate{}),
      std::span(vec));

  oneapi::dpl::ranges::sort(policy, std::span(vec));

  sycl::free(d_head, q);
  return 0;
}