DPCT1118#
Message#
SYCL group functions and algorithms must be encountered in converged control flow.
Detailed Help#
SYCL* group functions and algorithms must be encountered in converged control
flow for all work-items in a work-group. If a work-item completes the kernel and
exits early without reaching a collective work-group operation like barrier
,
all the other work-items in the work-group reaching the collective operation will
wait for the exited work-item.
Refer to How do I fix the issue of SYCL* code hanging due to work group level synchronization, such as a group barrier used in a conditional statement? for additional information.
Suggestions to Fix#
For example, this original CUDA* code:
1__global__ void kernel(float *data) {
2 int tid = threadIdx.x;
3 if (tid < 32) {
4 if (data[tid] < data[tid + 32]) {
5 data[tid] = data[tid + 32];
6 }
7 __syncthreads();
8 ...
9 }
10}
results in the following migrated SYCL code:
1void kernel(float *data, const sycl::nd_item<3> &item_ct1) {
2 int tid = item_ct1.get_local_id(2);
3 if (tid < 32) {
4 if (data[tid] < data[tid + 32]) {
5 data[tid] = data[tid + 32];
6 }
7 /*
8 DPCT1118:0: SYCL group functions and algorithms must be encountered in converged control flow. You should check this condition holds.
9 */
10 /*
11 DPCT1065:1: Consider replacing sycl::nd_item::barrier() with sycl::nd_item::barrier(sycl::access::fence_space::local_space) for better performance if there is no access to global memory.
12 */
13 item_ct1.barrier();
14 }
15}
which is rewritten to:
1void kernel(float *data, const sycl::nd_item<3> &item_ct1) {
2 int tid = item_ct1.get_local_id(2);
3
4 if (tid < 32) {
5 if (data[tid] < data[tid + 32]) {
6 data[tid] = data[tid + 32];
7 }
8 }
9 item_ct1.barrier();
10 ...
11}