DPCT1019#

Message#

local_mem_size in SYCL is not a complete equivalent of <variable name> in CUDA. You may need to adjust the code.

Detailed Help#

In CUDA*, the sharedMemPerBlock reports the size of the shared memory in bytes available per block. The SYCL* equivalent of a CUDA block is a work-group. The SYCL equivalent of shared memory is local memory. There is no limitation on the size of the local memory per work-group in SYCL. There is a limit on the maximum size of the local memory in bytes available per compute unit, which is exposed by the info::device::local_mem_size device descriptor in SYCL.

Suggestions to Fix#

Verify the code correctness.

For example, this original CUDA code:

 1void foo() {
 2  cudaDeviceProp prop;
 3  cudaGetDeviceProperties(&prop, 0);
 4  if (prop.sharedMemPerBlock >= threshold) {
 5    // submit the task
 6    Code piece A
 7  } else {
 8    // change the block size or block number
 9    Code piece B
10  }
11}

results in the following migrated SYCL code:

 1void foo() {
 2  dpct::device_info prop;
 3  dpct::dev_mgr::instance().get_device(0).get_device_info(prop);
 4  /*
 5  DPCT1019:0: local_mem_size in SYCL is not a complete equivalent of
 6  sharedMemPerBlock in CUDA. You may need to adjust the code.
 7  */
 8  if (prop.get_local_mem_size() >= threshold) {
 9    // submit the task
10    Code piece A
11  } else {
12    // change the block size or block number
13    Code piece B
14  }
15}

which is rewritten to:

 1void foo() {
 2  dpct::device_info prop;
 3  dpct::dev_mgr::instance().get_device(0).get_device_info(prop);
 4  if (prop.get_local_mem_size() >= threshold) {
 5    // submit the task
 6    Code piece A
 7  } else {
 8    // change the block size or block number
 9    Code piece B
10  }
11}