.. _onemkl_blas_gemv_batch: gemv_batch ========== Computes a group of ``gemv`` operations. .. _onemkl_blas_gemv_batch_description: .. rubric:: Description The ``gemv_batch`` routines are batched versions of :ref:`onemkl_blas_gemv`, performing multiple ``gemv`` operations in a single call. Each ``gemv`` operations perform a scalar-matrix-vector product and add the result to a scalar-vector product. ``gemv_batch`` supports the following precisions. .. list-table:: :header-rows: 1 * - T * - ``float`` * - ``double`` * - ``std::complex`` * - ``std::complex`` .. _onemkl_blas_gemv_batch_buffer: gemv_batch (Buffer Version) --------------------------- .. rubric:: Description The buffer version of ``gemv_batch`` supports only the strided API. The strided API operation is defined as: :: for i = 0 … batch_size – 1 A is a matrix at offset i * stridea in a. X and Y are matrices at offset i * stridex, i * stridey, in x and y. Y := alpha * op(A) * X + beta * Y end for where: op(A) is one of op(A) = A, or op(A) = A\ :sup:`T`, or op(A) = A\ :sup:`H`, ``alpha`` and ``beta`` are scalars, ``A`` is a matrix and ``X`` and ``Y`` are vectors, The ``x`` and ``y`` buffers contain all the input matrices. The stride between vectors is given by the stride parameter. The total number of vectors in ``x`` and ``y`` buffers is given by the ``batch_size`` parameter. **Strided API** .. rubric:: Syntax .. code-block:: cpp namespace oneapi::mkl::blas::column_major { void gemv_batch(sycl::queue &queue, onemkl::transpose trans, std::int64_t m, std::int64_t n, T alpha, sycl::buffer &a, std::int64_t lda, std::int64_t stridea, sycl::buffer &x, std::int64_t incx, std::int64_t stridex, T beta, sycl::buffer &y, std::int64_t incy, std::int64_t stridey, std::int64_t batch_size) } .. code-block:: cpp namespace oneapi::mkl::blas::row_major { void gemv_batch(sycl::queue &queue, onemkl::transpose trans, std::int64_t m, std::int64_t n, T alpha, sycl::buffer &a, std::int64_t lda, std::int64_t stridea, sycl::buffer &x, std::int64_t incx, std::int64_t stridex, T beta, sycl::buffer &y, std::int64_t incy, std::int64_t stridey, std::int64_t batch_size) } .. container:: section .. rubric:: Input Parameters queue The queue where the routine should be executed. trans Specifies op(``A``) the transposition operation applied to the matrices ``A``. See :ref:`onemkl_datatypes` for more details. m Number of rows of op(``A``). Must be at least zero. n Number of columns of op(``A``). Must be at least zero. alpha Scaling factor for the matrix-vector products. a Buffer holding the input matrices ``A`` with size ``stridea`` * ``batch_size``. lda The leading dimension of the matrices ``A``. It must be positive and at least ``m`` if column major layout is used or at least ``n`` if row major layout is used. stridea Stride between different ``A`` matrices. x Buffer holding the input vectors ``X`` with size ``stridex`` * ``batch_size``. incx The stride of the vector ``X``. It must be positive. stridex Stride between different consecutive ``X`` vectors, must be at least 0. beta Scaling factor for the vector ``Y``. y Buffer holding input/output vectors ``Y`` with size ``stridey`` * ``batch_size``. incy Stride between two consecutive elements of the ``y`` vectors. stridey Stride between two consecutive ``Y`` vectors. Must be at least (1 + (len-1)*abs(incy)) where ``len`` is ``m`` if the matrix ``A`` is non transpose or ``n`` otherwise. batch_size Specifies the number of matrix-vector operations to perform. .. container:: section .. rubric:: Output Parameters y Output overwritten by ``batch_size`` matrix-vector product operations of the form ``alpha`` * op(``A``) * ``X`` + ``beta`` * ``Y``. .. _onemkl_blas_gemv_batch_usm: gemv_batch (USM Version) --------------------------- .. rubric:: Description The USM version of ``gemv_batch`` supports the group API and strided API. The group API operation is defined as: :: idx = 0 for i = 0 … group_count – 1 for j = 0 … group_size – 1 A is an m x n matrix in a[idx] X and Y are vectors in x[idx] and y[idx] Y := alpha[i] * op(A) * X + beta[i] * Y idx = idx + 1 end for end for The strided API operation is defined as :: for i = 0 … batch_size – 1 A is a matrix at offset i * stridea in a. X and Y are vectors at offset i * stridex, i * stridey in x and y. Y := alpha * op(A) * X + beta * Y end for where: op(A) is one of op(A) = A, or op(A) = A\ :sup:`T`, or op(A) = A\ :sup:`H`, ``alpha`` and ``beta`` are scalars, ``A`` is a matrix and ``X`` and ``Y`` are vectors, For group API, ``x`` and ``y`` arrays contain the pointers for all the input vectors. ``A`` array contains the pointers to all input matrices. The total number of vectors in ``x`` and ``y`` and matrices in ``A`` are given by: .. math:: total\_batch\_count = \sum_{i=0}^{group\_count-1}group\_size[i] For strided API, ``x`` and ``y`` arrays contain all the input vectors. ``A`` array contains the pointers to all input matrices. The total number of vectors in ``x`` and ``y`` and matrices in ``A`` are given by the ``batch_size`` parameter. **Group API** .. rubric:: Syntax .. code-block:: cpp namespace oneapi::mkl::blas::column_major { sycl::event gemv_batch(sycl::queue &queue, onemkl::transpose *trans, std::int64_t *m, std::int64_t *n, T *alpha, const T **a, std::int64_t *lda, const T **x, std::int64_t *incx, T *beta, T **y, std::int64_t *incy, std::int64_t group_count, std::int64_t *group_size, const std::vector &dependencies = {}) } .. code-block:: cpp namespace oneapi::mkl::blas::row_major { sycl::event gemv_batch(sycl::queue &queue, onemkl::transpose *trans, std::int64_t *m, std::int64_t *n, T *alpha, const T **a, std::int64_t *lda, const T **x, std::int64_t *incx, T *beta, T **y, std::int64_t *incy, std::int64_t group_count, std::int64_t *group_size, const std::vector &dependencies = {}) } .. container:: section .. rubric:: Input Parameters queue The queue where the routine should be executed. trans Array of ``group_count`` ``onemkl::transpose`` values. ``trans[i]`` specifies the form of op(``A``) used in the matrix-vector product in group ``i``. See :ref:`onemkl_datatypes` for more details. m Array of ``group_count`` integers. ``m[i]`` specifies the number of rows of op(``A``) for every matrix in group ``i``. All entries must be at least zero. n Array of ``group_count`` integers. ``n[i]`` specifies the number of columns of op(``A``) for every matrix in group ``i``. All entries must be at least zero. alpha Array of ``group_count`` scalar elements. ``alpha[i]`` specifies the scaling factor for every matrix-vector product in group ``i``. a Array of pointers to input matrices ``A`` with size ``total_batch_count``. See :ref:`matrix-storage` for more details. lda Array of ``group_count`` integers. ``lda[i]`` specifies the leading dimension of ``A`` for every matrix in group ``i``. All entries must be positive and at least ``m`` if column major layout is used or at least ``n`` if row major layout is used. x Array of pointers to input vectors ``X`` with size ``total_batch_count``. See :ref:`matrix-storage` for more details. incx Array of ``group_count`` integers. ``incx[i]`` specifies the stride of ``X`` for every vector in group ``i``. All entries must be positive. beta Array of ``group_count`` scalar elements. ``beta[i]`` specifies the scaling factor for vector ``Y`` for every vector in group ``i``. y Array of pointers to input/output vectors ``Y`` with size ``total_batch_count``. See :ref:`matrix-storage` for more details. incy Array of ``group_count`` integers. ``incy[i]`` specifies the leading dimension of ``Y`` for every vector in group ``i``. All entries must be positive and ``incy[i]`` must be at least ``m[i]`` if column major layout is used or at least ``n[i]`` if row major layout is used. group_count Specifies the number of groups. Must be at least 0. group_size Array of ``group_count`` integers. ``group_size[i]`` specifies the number of matrix-vector products in group ``i``. All entries must be at least 0. dependencies List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies. .. container:: section .. rubric:: Output Parameters y Overwritten by vector calculated by (``alpha[i]`` * op(``A``) * ``X`` + ``beta[i]`` * ``Y``) for group ``i``. .. container:: section .. rubric:: Return Values Output event to wait on to ensure computation is complete. **Strided API** .. rubric:: Syntax .. code-block:: cpp namespace oneapi::mkl::blas::column_major { sycl::event gemv_batch(sycl::queue &queue, onemkl::transpose trans, std::int64_t m, std::int64_t n, T alpha, const T *a, std::int64_t lda, std::int64_t stridea, const T *x, std::int64_t incx, std::int64_t stridex, T beta, T *y, std::int64_t incy, std::int64_t stridey, std::int64_t batch_size, const std::vector &dependencies = {}) } .. code-block:: cpp namespace oneapi::mkl::blas::row_major { sycl::event gemv_batch(sycl::queue &queue, onemkl::transpose trans, std::int64_t m, std::int64_t n, T alpha, const T *a, std::int64_t lda, std::int64_t stridea, const T *x, std::int64_t incx, std::int64_t stridex, T beta, T *y, std::int64_t incy, std::int64_t stridey, std::int64_t batch_size, const std::vector &dependencies = {}) } .. container:: section .. rubric:: Input Parameters queue The queue where the routine should be executed. trans Specifies op(``A``) the transposition operation applied to the matrices ``A``. See :ref:`onemkl_datatypes` for more details. m Number of rows of op(``A``). Must be at least zero. n Number of columns of op(``A``). Must be at least zero. alpha Scaling factor for the matrix-vector products. a Pointer to the input matrices ``A`` with size ``stridea`` * ``batch_size``. lda The leading dimension of the matrices ``A``. It must be positive and at least ``m`` if column major layout is used or at least ``n`` if row major layout is used. stridea Stride between different ``A`` matrices. x Pointer to the input vectors ``X`` with size ``stridex`` * ``batch_size``. incx Stride of the vector ``X``. It must be positive. stridex Stride between different consecutive ``X`` vectors, must be at least 0. beta Scaling factor for the vector ``Y``. y Pointer to the input/output vectors ``Y`` with size ``stridey`` * ``batch_size``. incy Stride between two consecutive elements of the ``y`` vectors. stridey Stride between two consecutive ``Y`` vectors. Must be at least (1 + (len-1)*abs(incy)) where ``len`` is ``m`` if the matrix ``A`` is non transpose or ``n`` otherwise. batch_size Specifies the number of matrix-vector operations to perform. .. container:: section .. rubric:: Output Parameters y Output overwritten by ``batch_size`` matrix-vector product operations of the form ``alpha`` * op(``A``) * ``X`` + ``beta`` * ``Y``. .. container:: section .. rubric:: Return Values Output event to wait on to ensure computation is complete. **Parent topic:** :ref:`blas-like-extensions`