DPCT1049#

Message#

The work-group size passed to the SYCL kernel may exceed the limit. To get the device limit, query info::device::max_work_group_size. Adjust the work-group size if needed.

Detailed Help#

The work-group size passed to the SYCL* kernel for SYCL device has a limit (see SYCL 2020 standard, 4.6.4.2 Device information descriptors).

This warning appears if dimensions of the local range could not all be evaluated, or if the product of the dimensions of the local range is more than 256.

Suggestions to Fix#

Query info::device::max_work_group_size to define the work-group size limit for the device you use. If the work-group size used in the code is below the limit, you can ignore this warning. Otherwise, you need to decrease the work-group size.

For example, this original CUDA* code:

1 __global__ void k() {}
2
3 void foo() {
4   k<<<1, 2048>>>();
5 }

results in the following migrated SYCL code:

 1 void k() {}
 2
 3 void foo() {
 4   /*
 5   DPCT1049:0: The work-group size passed to the SYCL kernel may exceed the
 6   limit. To get the device limit, query info::device::max_work_group_size.
 7   Adjust the work-group size if needed.
 8   */
 9   dpct::get_default_queue().parallel_for(
10       sycl::nd_range<3>(sycl::range<3>(1, 1, 2048), sycl::range<3>(1, 1, 2048)),
11       [=](sycl::nd_item<3> item_ct1) {
12         k();
13       });
14 }

which is rewritten to:

 1 void k() {}
 2
 3 void foo() {
 4   size_t max_work_group_size =
 5       dpct::get_default_queue()
 6           .get_device()
 7           .get_info<sycl::info::device::max_work_group_size>();
 8   size_t work_group_size = 2048;
 9   if (work_group_size > max_work_group_size) {
10     work_group_size = max_work_group_size;
11   }
12   size_t work_group_num = std::ceil((float)2048 / (float)work_group_size);
13   dpct::get_default_queue().parallel_for(
14       sycl::nd_range<3>(sycl::range<3>(1, 1, work_group_num * work_group_size),
15                         sycl::range<3>(1, 1, work_group_size)),
16       [=](sycl::nd_item<3> item_ct1) { k(); });
17 }