DPCT1130#

Message#

SYCL 2020 standard does not support dynamic parallelism (launching kernel in device code). Please rewrite the code.

Detailed Help#

SYCL* does not support launching kernel in device code. The user needs to merge the parent kernel and child kernel together.

Suggestions to Fix#

For example, this original CUDA* code:

 1__global__ void childKernel() {
 2      ...
 3}
 4__global__ void parentKernel() {
 5      ...
 6      childKernel<<<4, 4>>>();
 7      ...
 8}
 9void foo() {
10      ...
11      parentKernel<<<8, 8>>>();
12      ...
13}

results in the following migrated SYCL code:

 1void childKernel() {
 2  ...
 3}
 4void parentKernel() {
 5  ...
 6  /*
 7  DPCT1130:0: SYCL 2020 standard does not support dynamic parallelism (launching
 8  kernel in device code). Please rewrite the code.
 9  */
10  childKernel<<<4, 4>>>();
11  ...
12}
13void foo() {
14  ...
15  dpct::get_in_order_queue().parallel_for(
16      sycl::nd_range<3>(sycl::range<3>(1, 1, 8) * sycl::range<3>(1, 1, 8),
17                        sycl::range<3>(1, 1, 8)),
18      [=](sycl::nd_item<3> item_ct1) {
19        parentKernel();
20      });
21  ...
22}

which is rewritten to:

 1void childKernel() {
 2  ...
 3}
 4void parentKernel() {
 5  ...
 6  childKernel(); // call childKernel() as a device function, need to adjust the work
 7  for each work item.
 8  ...
 9}
10void foo() {
11  ...
12  dpct::get_in_order_queue().parallel_for(
13      sycl::nd_range<3>(sycl::range<3>(1, 1, placeholder /*Adjust the global range
14      based on the thread model between parentKernel and childKernel*/),
15                        sycl::range<3>(1, 1, placeholder /*Adjust the local range
16                        based on the thread model between parentKernel and
17                        childKernel */)),
18      [=](sycl::nd_item<3> item_ct1) {
19        parentKernel();
20      });
21  ...
22}