DPCT1118#

Message#

SYCL group functions and algorithms must be encountered in converged control flow.

Detailed Help#

SYCL* group functions and algorithms must be encountered in converged control flow for all work-items in a work-group. If a work-item completes the kernel and exits early without reaching a collective work-group operation like barrier, all the other work-items in the work-group reaching the collective operation will wait for the exited work-item.

Refer to How do I fix the issue of SYCL* code hanging due to work group level synchronization, such as a group barrier used in a conditional statement? for additional information.

Suggestions to Fix#

For example, this original CUDA* code:

 1__global__ void kernel(float *data) {
 2  int tid = threadIdx.x;
 3  if (tid < 32) {
 4    if (data[tid] < data[tid + 32]) {
 5      data[tid] = data[tid + 32];
 6    }
 7    __syncthreads();
 8    ...
 9  }
10}

results in the following migrated SYCL code:

 1void kernel(float *data, const sycl::nd_item<3> &item_ct1) {
 2  int tid = item_ct1.get_local_id(2);
 3  if (tid < 32) {
 4    if (data[tid] < data[tid + 32]) {
 5      data[tid] = data[tid + 32];
 6    }
 7    /*
 8    DPCT1118:0: SYCL group functions and algorithms must be encountered in converged control flow. You should check this condition holds.
 9    */
10    /*
11    DPCT1065:1: Consider replacing sycl::nd_item::barrier() with sycl::nd_item::barrier(sycl::access::fence_space::local_space) for better performance if there is no access to global memory.
12    */
13    item_ct1.barrier();
14  }
15}

which is rewritten to:

 1void kernel(float *data, const sycl::nd_item<3> &item_ct1) {
 2  int tid = item_ct1.get_local_id(2);
 3
 4  if (tid < 32) {
 5    if (data[tid] < data[tid + 32]) {
 6      data[tid] = data[tid + 32];
 7    }
 8  }
 9  item_ct1.barrier();
10  ...
11}