DPCT1018#
Message#
The <API name> was migrated, but due to <reason>, the generated code performance may be sub-optimal.
Detailed Help#
This warning appears in the following cases:
Migration of the
cublasSetMatrix
function. SYCLomatic replaced thecublasSetMatrix
with memory copying from the host to the device. When therows
parameter of thecublasSetMatrix
is smaller than thelda
parameter, the generated code copies more data (lda*cols
) than the actual data available in the matrix (rows*cols
).To improve performance, consider changing the values of
lda
andldb
. If the rows parameter is greater than or equal tolda
, no action is required for this code.Migration of the
cublasSetVector
function. SYCLomatic replaced thecublasSetVector
with memory copying from the host to the device. When theincx
parameter of thecublasSetVector
equals theincy
parameter, but is greater than 1, the generated code copies more data (incx*n
) than the actual data available in the vector (n
). To improve performance, consider changing the values ofincx
andincy
.
Suggestions to Fix#
If the rows parameter of the cublasSetMatrix
is smaller than the lda
parameter and you observe performance issues, consider changing the values of
lda
and ldb
.
If the incx
parameter of the cublasSetVector
equals the incy
parameter,
but is greater than 1 and you observe performance issues, consider changing the
values of incx
and incy
.
For example, this original CUDA* code:
1void foo() {
2 const int element_num = 128;
3 const int h_inc = 128;
4 const int d_inc = 128;
5 cublasSetVector(element_num, sizeof(float), data, h_inc, d_data, d_inc);
6}
results in the following migrated SYCL* code:
1void foo() {
2 const int element_num = 128;
3 const int h_inc = 128;
4 const int d_inc = 128;
5 /*
6 DPCT1018:0: The cublasSetVector was migrated, but due to parameter h_inc
7 equals to parameter d_inc but greater than 1, the generated code performance
8 may be sub-optimal.
9 */
10 dpct::matrix_mem_copy((void *)d_data, (void *)data, d_inc, h_inc, 1,
11 element_num, sizeof(float));
12}
which is rewritten to:
1void foo() {
2 const int element_num = 128;
3
4 //Save the data in d_data continuously and change h_inc and d_inc from 128 to 1.
5 const int h_inc = 1;
6 const int d_inc = 1;
7
8 // Now there is no padding between each element, so memcpy can be used directly.
9 dpct::get_default_queue().memcpy(d_data, data, sizeof(float) * element_num).wait();
10}