Binary¶
General¶
The binary primitive computes the result of a binary elementwise operation between tensors source 0 and source 1 (the variable names follow the standard Naming Conventions):
where \(op\) is one of addition, subtraction, multiplication, division, greater than or equal to, greater than, less than or equal to, less than, equal to, not equal to, get maximum value, and get minimum value.
The binary primitive does not have a notion of forward or backward propagations.
Execution Arguments¶
When executed, the inputs and outputs should be mapped to an execution argument index as specified by the following table.
Primitive input/output |
Execution argument index |
---|---|
\(\src_0\) |
DNNL_ARG_SRC_0 |
\(\src_1\) |
DNNL_ARG_SRC_1 |
\(\dst\) |
DNNL_ARG_DST |
\(\text{binary post-op}\) |
DNNL_ARG_ATTR_MULTIPLE_POST_OP(binary_post_op_position) | DNNL_ARG_SRC_1 |
\(binary scale0\) |
DNNL_ARG_ATTR_INPUT_SCALES | DNNL_ARG_SRC_0 |
\(binary scale1\) |
DNNL_ARG_ATTR_INPUT_SCALES | DNNL_ARG_SRC_1 |
Implementation Details¶
General Notes¶
The binary primitive requires all source and destination tensors to have the same number of dimensions.
The binary primitive supports implicit broadcast semantics for source 0 and source 1. It means that if some dimension has value of one, this value will be used to compute an operation with each point of source 0 / source 1 for this dimension. It is recommended to use broadcast for source 1 to get better performance. Generally it should match the syntax below:
{N,1}x{C,1}x{D,1}x{H,1}x{W,1}:{N,1}x{C,1}x{D,1}x{H,1}x{W,1} -> NxCxDxHxW
. It is consistent with PyTorch broadcast semantic.The \(\dst\) memory format can be either specified explicitly or by dnnl::memory::format_tag::any (recommended), in which case the primitive will derive the most appropriate memory format based on the format of the source 0 tensor. The \(\dst\) tensor dimensions must match the ones of the source 0 and source 1 tensors (except for broadcast dimensions).
The binary primitive supports in-place operations, meaning that source 0 tensor may be used as the destination, in which case its data will be overwritten. In-place mode requires the \(\dst\) and source 0 data types to be the same. Different data types will unavoidably lead to correctness issues.
Post-ops and Attributes¶
The following attributes are supported:
Type |
Operation |
Description |
Res |
---|---|---|---|
Attribute |
Scales |
Scales the corresponding input tensor by the given scale factor(s). |
Only one scale per tensor is supported. Input tensors only. |
Post-op |
Adds the operation result to the destination tensor instead of overwriting it. |
||
Post-op |
Applies an Eltwise operation to the result. |
||
Post-op |
Applies a Binary operation to the result |
General binary post-op restrictions |
Data Types Support¶
The source and destination tensors may have f32
, bf16
, f16
or s8/u8
data types. The binary primitive supports the following combinations of data types:
Source 0 / 1 |
Des |
---|---|
bf16 |
bf16 |
s8, u8, f16, f32 |
s8, u8, f16, f32 |
Warning
There might be hardware and/or implementation specific restrictions. Check Implementation Limitations section below.
Data Representation¶
Sources, Destination¶
The binary primitive works with arbitrary data tensors. There is no special meaning associated with any of tensors dimensions.
Implementation Limitations¶
Refer to Data Types for limitations related to data types support.
GPU
Implicit broadcast for source 0 is not supported.
Performance Tips¶
Whenever possible, avoid specifying different memory formats for source tensors.
Examples¶
binary_example_cpp - CPU/GPU¶
This C++ API example demonstrates how to create and execute a Binary primitive.
Key optimizations included in this example:
In-place primitive execution;
Primitive attributes with fused post-ops.
bnorm_u8_via_binary_postops_cpp - CPU/GPU¶
Bnorm u8 via binary postops example.