DPCT1130#
Message#
SYCL 2020 standard does not support dynamic parallelism (launching kernel in device code). Please rewrite the code.
Detailed Help#
SYCL* does not support launching kernel in device code. The user needs to merge the parent kernel and child kernel together.
Suggestions to Fix#
For example, this original CUDA* code:
1__global__ void childKernel() {
2 ...
3}
4__global__ void parentKernel() {
5 ...
6 childKernel<<<4, 4>>>();
7 ...
8}
9void foo() {
10 ...
11 parentKernel<<<8, 8>>>();
12 ...
13}
results in the following migrated SYCL code:
1void childKernel() {
2 ...
3}
4void parentKernel() {
5 ...
6 /*
7 DPCT1130:0: SYCL 2020 standard does not support dynamic parallelism (launching
8 kernel in device code). Please rewrite the code.
9 */
10 childKernel<<<4, 4>>>();
11 ...
12}
13void foo() {
14 ...
15 dpct::get_in_order_queue().parallel_for(
16 sycl::nd_range<3>(sycl::range<3>(1, 1, 8) * sycl::range<3>(1, 1, 8),
17 sycl::range<3>(1, 1, 8)),
18 [=](sycl::nd_item<3> item_ct1) {
19 parentKernel();
20 });
21 ...
22}
which is rewritten to:
1void childKernel() {
2 ...
3}
4void parentKernel() {
5 ...
6 childKernel(); // call childKernel() as a device function, need to adjust the work
7 for each work item.
8 ...
9}
10void foo() {
11 ...
12 dpct::get_in_order_queue().parallel_for(
13 sycl::nd_range<3>(sycl::range<3>(1, 1, placeholder /*Adjust the global range
14 based on the thread model between parentKernel and childKernel*/),
15 sycl::range<3>(1, 1, placeholder /*Adjust the local range
16 based on the thread model between parentKernel and
17 childKernel */)),
18 [=](sycl::nd_item<3> item_ct1) {
19 parentKernel();
20 });
21 ...
22}