DPCT1018#

Message#

The <API name> was migrated, but due to <reason>, the generated code performance may be sub-optimal.

Detailed Help#

This warning appears in the following cases:

  • Migration of the cublasSetMatrix function. SYCLomatic replaced the cublasSetMatrix with memory copying from the host to the device. When the rows parameter of the cublasSetMatrix is smaller than the lda parameter, the generated code copies more data (lda*cols) than the actual data available in the matrix (rows*cols).

    To improve performance, consider changing the values of lda and ldb. If the rows parameter is greater than or equal to lda, no action is required for this code.

  • Migration of the cublasSetVector function. SYCLomatic replaced the cublasSetVector with memory copying from the host to the device. When the incx parameter of the cublasSetVector equals the incy parameter, but is greater than 1, the generated code copies more data (incx*n) than the actual data available in the vector (n). To improve performance, consider changing the values of incx and incy.

Suggestions to Fix#

If the rows parameter of the cublasSetMatrix is smaller than the lda parameter and you observe performance issues, consider changing the values of lda and ldb.

If the incx parameter of the cublasSetVector equals the incy parameter, but is greater than 1 and you observe performance issues, consider changing the values of incx and incy.

For example, this original CUDA* code:

1void foo() {
2  const int element_num = 128;
3  const int h_inc = 128;
4  const int d_inc = 128;
5  cublasSetVector(element_num, sizeof(float), data, h_inc, d_data, d_inc);
6}

results in the following migrated SYCL* code:

 1void foo() {
 2  const int element_num = 128;
 3  const int h_inc = 128;
 4  const int d_inc = 128;
 5  /*
 6  DPCT1018:0: The cublasSetVector was migrated, but due to parameter h_inc
 7  equals to parameter d_inc but greater than 1, the generated code performance
 8  may be sub-optimal.
 9  */
10  dpct::matrix_mem_copy((void *)d_data, (void *)data, d_inc, h_inc, 1,
11                        element_num, sizeof(float));
12}

which is rewritten to:

 1void foo() {
 2  const int element_num = 128;
 3
 4  //Save the data in d_data continuously and change h_inc and d_inc from 128 to 1.
 5  const int h_inc = 1;
 6  const int d_inc = 1;
 7
 8  // Now there is no padding between each element, so memcpy can be used directly.
 9  dpct::get_default_queue().memcpy(d_data, data, sizeof(float) * element_num).wait();
10}