The reduction primitive performs reduction operation on arbitrary data. Each element in the destination is the result of reduction operation with specified algorithm along one or multiple source tensor dimensions:
\[ \dst(f) = \mathop{reduce\_op}\limits_{r}\src(r), \]
where \(reduce\_op\) can be max, min, sum, mul, mean, Lp-norm and Lp-norm-power-p, \(f\) is an index in an idle dimension and \(r\) is an index in a reduction dimension.
Mean:
\[ \dst(f) = \frac{\sum\limits_{r}\src(r)} {R}, \]
where \(R\) is the size of a reduction dimension.
Lp-norm:
\[ \dst(f) = \root p \of {\mathop{eps\_op}(\sum\limits_{r}|src(r)|^p, eps)}, \]
where \(eps\_op\) can be max and sum.
Lp-norm-power-p:
\[ \dst(f) = \mathop{eps\_op}(\sum\limits_{r}|src(r)|^p, eps), \]
where \(eps\_op\) can be max and sum.
When executed, the inputs and outputs should be mapped to an execution argument index as specified by the following table.
Primitive input/output | Execution argument index |
---|---|
\(\src\) | DNNL_ARG_SRC |
\(\dst\) | DNNL_ARG_DST |
The source and destination tensors may have f32
, bf16
, or int8
data types. See Data Types page for more details.
The reduction primitive works with arbitrary data tensors. There is no special meaning associated with any of the dimensions of a tensor.
Engine | Name | Com |
---|---|---|
CPU/GPU | Reduction Primitive Example | This C++ API example demonstrates how to create and execute a Reduction primitive. |