DPCT1096#
Message#
The right-most dimension of the work-group used in the SYCL kernel that calls this function may be less than <value of kernel sub-group size attribute>. The function <help function name> may return an unexpected result on the CPU device. Modify the size of the work-group to ensure that the value of the right-most dimension is a multiple of <value of kernel sub-group size attribute>.
Detailed Help#
The dpct::select_from_sub_group
, dpct::shift_sub_group_left
,
dpct::shift_sub_group_right
, and dpct::permute_sub_group_by_xor
functions
may return unexpected results when run on a CPU device with an OpenCLâ„¢ backend,
if the right-most dimension value of the work-group used in the SYCL* kernel that
calls these functions is less than the value of the kernel sub-group size attribute.
The real sub-group size may not be the value specified by the kernel sub-group
size attribute and could cause the helper function to return unexpected results
on a CPU device.
Adjust the code by modifying the size of the work-group to ensure that the value of the right-most dimension is a multiple of the kernel sub-group size attribute.
For example, this original CUDA* code:
1 __global__ void kernel() {
2 ...
3 value = __shfl_down(x, delta);
4 ...
5 }
6
7 void foo() {
8 ...
9 auto GridSize = dim3(2);
10 auto BlockSize = dim3(8, 8, 1);
11 kernel<<<GridSize, BlockSize>>>();
12 ...
13 }
results in the following migrated SYCL code:
1 void kernel(const sycl::nd_item<3> &item_ct1) {
2 ...
3 /*
4 DPCT1096:0: The right-most dimension of the work-group used in the SYCL kernel
5 that calls this function may be less than "32". The function
6 "dpct::shift_sub_group_left" may return an unexpected result on the CPU
7 device. Modify the size of the work-group to ensure that the value of the
8 right-most dimension is a multiple of "32".
9 */
10 value = dpct::shift_sub_group_left(item_ct1.get_sub_group(), x, delta); // May return unexpected result on CPU
11 ...
12 }
13
14 void foo() {
15 ...
16 auto GridSize = sycl::range<3>(1, 1, 2);
17 auto BlockSize = sycl::range<3>(1, 8, 8); // Problem: value of the right-most dimension 8 is less than the kernel sub group size attribute 32.
18 dpct::get_in_order_queue().parallel_for(
19 sycl::nd_range<3>(GridSize * BlockSize, BlockSize),
20 [=](sycl::nd_item<3> item_ct1) [[intel::reqd_sub_group_size(32)]] {
21 kernel(item_ct1);
22 });
23 ...
24 }
which is rewritten to:
1 void kernel(const sycl::nd_item<3> &item_ct1) {
2 ...
3 value = dpct::shift_sub_group_left(item_ct1.get_sub_group(), x, delta);
4 ...
5 }
6
7 void foo() {
8 ...
9 auto GridSize = sycl::range<3>(1, 1, 2);
10 auto BlockSize = sycl::range<3>(1, 2, 32); // Fix: modified work group size to make the right-most dimension to be multiple of the kernel sub group size attribute value, which is 32.
11 dpct::get_in_order_queue().parallel_for(
12 sycl::nd_range<3>(GridSize * BlockSize, BlockSize),
13 [=](sycl::nd_item<3> item_ct1) [[intel::reqd_sub_group_size(32)]] {
14 kernel(item_ct1);
15 });
16 ...
17 }
Suggestions to Fix#
If the program needs to execute on a CPU device, you may need to adjust the code.