dgmm_batch

Computes a group of dgmm operations.

Description

The dgmm_batch routines perform multiple diagonal matrix-matrix product operations in a single call.

dgmm_batch supports the following precisions.

T

float

double

std::complex<float>

std::complex<double>

dgmm_batch (Buffer Version)

Description

The buffer version of dgmm_batch supports only the strided API.

The strided API operation is defined as:

for i = 0 … batch_size – 1
    A and C are matrices at offset i * stridea in a, i * stridec in c.
    X is a vector at offset i * stridex in x
    C := diag(X) * A or  C = A * diag(X)
end for

where:

A is a matrix,

X is a diagonal matrix stored as a vector

The a and x buffers contain all the input matrices. The stride between matrices is given by the stride parameter. The total number of matrices in a and x buffers is given by the batch_size parameter.

Strided API

Syntax

namespace oneapi::mkl::blas::column_major {
    void dgmm_batch(sycl::queue &queue,
                    onemkl::mkl::side left_right,
                    std::int64_t m,
                    std::int64_t n,
                    sycl::buffer<T,1> &a,
                    std::int64_t lda,
                    std::int64_t stridea,
                    sycl::buffer<T,1> &x,
                    std::int64_t incx,
                    std::int64_t stridex,
                    sycl::buffer<T,1> &c,
                    std::int64_t ldc,
                    std::int64_t stridec,
                    std::int64_t batch_size)
}
namespace oneapi::mkl::blas::row_major {
    void dgmm_batch(sycl::queue &queue,
                    onemkl::mkl::side left_right,
                    std::int64_t m,
                    std::int64_t n,
                    sycl::buffer<T,1> &a,
                    std::int64_t lda,
                    std::int64_t stridea,
                    sycl::buffer<T,1> &x,
                    std::int64_t incx,
                    std::int64_t stridex,
                    sycl::buffer<T,1> &c,
                    std::int64_t ldc,
                    std::int64_t stridec,
                    std::int64_t batch_size)
}

Input Parameters

queue

The queue where the routine should be executed.

left_right

Specifies the position of the diagonal matrix in the product. See oneMKL defined datatypes for more details.

m

Number of rows of matrices A and C. Must be at least zero.

n

Number of columns of matrices A and C. Must be at least zero.

a

Buffer holding the input matrices A with size stridea * batch_size. Must be of at least lda * j + stridea * (batch_size - 1) where j is n if column major layout is used or m if major layout is used.

lda

The leading dimension of the matrices A. It must be positive and at least m if column major layout is used or at least n if row major layout is used.

stridea

Stride between different A matrices.

x

Buffer holding the input matrices X with size stridex * batch_size. Must be of size at least (1 + (len - 1)*abs(incx)) + stridex * (batch_size - 1) where len is n if the diagonal matrix is on the right of the product or m otherwise.

incx

Stride between two consecutive elements of the x vectors.

stridex

Stride between different X vectors, must be at least 0.

c

Buffer holding input/output matrices C with size stridec * batch_size.

ldc

The leading dimension of the matrices C. It must be positive and at least m if column major layout is used to store matrices or at least n if column major layout is used to store matrices.

stridec

Stride between different C matrices. Must be at least ldc * n if column major layout is used or ldc * m if row major layout is used.

batch_size

Specifies the number of diagonal matrix-matrix product operations to perform.

Output Parameters

c

Output overwritten by batch_size diagonal matrix-matrix product operations.

dgmm_batch (USM Version)

Description

The USM version of dgmm_batch supports the group API and strided API.

The group API operation is defined as:

idx = 0
for i = 0 … group_count – 1
    for j = 0 … group_size – 1
        a and c are matrices of size mxn at position idx in a_array and c_array
        x is a vector of size m or n depending on left_right, at position idx in x_array
        if (left_right == oneapi::mkl::side::left)
            c := diag(x) * a
        else
            c := a * diag(x)
        idx := idx + 1
    end for
end for

The strided API operation is defined as

for i = 0 … batch_size – 1
    A and C are matrices at offset i * stridea in a, i * stridec in c.
    X is a vector at offset i * stridex in x
    C := diag(X) * A or  C = A * diag(X)
end for

where:

A is a matrix,

X is a diagonal matrix stored as a vector

The a and x buffers contain all the input matrices. The stride between matrices is given by the stride parameter. The total number of matrices in a and x buffers is given by the batch_size parameter.

For group API, a and x arrays contain the pointers for all the input matrices. The total number of matrices in a and x are given by:

\[total\_batch\_count = \sum_{i=0}^{group\_count-1}group\_size[i]\]

For strided API, a and x arrays contain all the input matrices. The total number of matrices in a and x are given by the batch_size parameter.

Group API

Syntax

namespace oneapi::mkl::blas::column_major {
    sycl::event dgmm_batch(sycl::queue &queue,
                           onemkl::mkl::side *left_right,
                           std::int64_t *m,
                           std::int64_t *n,
                           const T **a,
                           std::int64_t *lda,
                           const T **x,
                           std::int64_t *incx,
                           T **c,
                           std::int64_t *ldc,
                           std::int64_t group_count,
                           std::int64_t *group_size,
                           const std::vector<sycl::event> &dependencies = {})
}
namespace oneapi::mkl::blas::row_major {
    sycl::event dgmm_batch(sycl::queue &queue,
                           onemkl::mkl::side *left_right,
                           std::int64_t *m,
                           std::int64_t *n,
                           const T **a,
                           std::int64_t *lda,
                           const T **x,
                           std::int64_t *incx,
                           T **c,
                           std::int64_t *ldc,
                           std::int64_t group_count,
                           std::int64_t *group_size,
                           const std::vector<sycl::event> &dependencies = {})
}

Input Parameters

queue

The queue where the routine should be executed.

left_right

Specifies the position of the diagonal matrix in the product. See oneMKL defined datatypes for more details.

m

Array of group_count integers. m[i] specifies the number of rows of A for every matrix in group i. All entries must be at least zero.

n

Array of group_count integers. n[i] specifies the number of columns of A for every matrix in group i. All entries must be at least zero.

a

Array of pointers to input matrices A with size total_batch_count. Must be of size at least lda[i] * n[i] if column major layout is used or at least lda[i] * m[i] if row major layout is used. See Matrix Storage for more details.

lda

Array of group_count integers. lda[i] specifies the leading dimension of A for every matrix in group i. All entries must be positive and at least m[i] if column major layout is used or at least n[i] if row major layout is used.

x

Array of pointers to input vectors X with size total_batch_count. Must be of size at least (1 + len[i] – 1)*abs(incx[i])) where len[i] is n[i] if the diagonal matrix is on the right of the product or m[i] otherwise. See Matrix Storage for more details.

incx

Array of group_count integers. incx[i] specifies the stride of x for every vector in group i. All entries must be positive.

c

Array of pointers to input/output matrices C with size total_batch_count. Must be of size at least ldc[i] * n[i] if column major layout is used or at least ldc[i] * m[i] if row major layout is used. See Matrix Storage for more details.

ldc

Array of group_count integers. ldc[i] specifies the leading dimension of C for every matrix in group i. All entries must be positive and ldc[i] must be at least m[i] if column major layout is used to store matrices or at least n[i] if row major layout is used to store matrices.

group_count

Specifies the number of groups. Must be at least 0.

group_size

Array of group_count integers. group_size[i] specifies the number of diagonal matrix-matrix product operations in group i. All entries must be at least 0.

dependencies

List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Output Parameters

c

Output overwritten by batch_size diagonal matrix-matrix product operations.

Return Values

Output event to wait on to ensure computation is complete.

Strided API

Syntax

namespace oneapi::mkl::blas::column_major {
    sycl::event dgmm_batch(sycl::queue &queue,
                           onemkl::mkl::side left_right,
                           std::int64_t m,
                           std::int64_t n,
                           const T *a,
                           std::int64_t lda,
                           std::int64_t stridea,
                           const T *b,
                           std::int64_t incx,
                           std::int64_t stridex,
                           T *c,
                           std::int64_t ldc,
                           std::int64_t stridec,
                           std::int64_t batch_size,
                           const std::vector<sycl::event> &dependencies = {})
}
namespace oneapi::mkl::blas::row_major {
    sycl::event dgmm_batch(sycl::queue &queue,
                           onemkl::mkl::side left_right,
                           std::int64_t m,
                           std::int64_t n,
                           const T *a,
                           std::int64_t lda,
                           std::int64_t stridea,
                           const T *b,
                           std::int64_t incx,
                           std::int64_t stridex,
                           T *c,
                           std::int64_t ldc,
                           std::int64_t stridec,
                           std::int64_t batch_size,
                           const std::vector<sycl::event> &dependencies = {})
}

Input Parameters

queue

The queue where the routine should be executed.

left_right

Specifies the position of the diagonal matrix in the product. See oneMKL defined datatypes for more details.

m

Number of rows of A. Must be at least zero.

n

Number of columns of A. Must be at least zero.

a

Pointer to input matrices A with size stridea * batch_size. Must be of size at least lda * k + stridea * (batch_size - 1) where k is n if column major layout is used or m if row major layout is used.

lda

The leading dimension of the matrices A. It must be positive and at least m. Must be positive and at least m if column major layout is used or at least n if row major layout is used.

stridea

Stride between different A matrices.

x

Pointer to input matrices X with size stridex * batch_size. Must be of size at least (1 + (len - 1)*abs(incx)) + stridex * (batch_size - 1) where len is n if the diagonal matrix is on the right of the product or m otherwise.

incx

Stride between two consecutive elements of the x vector.

stridex

Stride between different X vectors, must be at least 0.

c

Pointer to input/output matrices C with size stridec * batch_size.

ldc

The leading dimension of the matrices C. It must be positive and at least ldc * m if column major layout is used to store matrices or at least ldc * n if column major layout is used to store matrices.

stridec

Stride between different C matrices. Must be at least ldc * n if column major layout is used or ldc * m if row major layout is used.

batch_size

Specifies the number of diagonal matrix-matrix product operations to perform.

Output Parameters

c

Output overwritten by batch_size diagonal matrix-matrix product operations.

Return Values

Output event to wait on to ensure computation is complete.

Parent topic: BLAS-like Extensions