LayerNormBackward¶
General¶
LayerNormBackward performs the backward of LayerNorm operation.
The backward propagation computes \(\diffsrc(t, n, c)\), \(\diffgamma(c)^*\), and \(\diffbeta(c)^*\) based on \(\diffdst(t, n, c)\), \(src(t, n, c)\), \(\mu(t, n)\), \(\sigma^2(t, n)\), \(\gamma(c) ^*\), and \(\beta(c) ^*\).
The tensors marked with an asterisk are used only when the operation is configured to use \(\gamma(c)\), and \(\beta(c)\)
Operation attributes¶
Attribute Name |
Description |
Value Type |
Supported Values |
Required or Optional |
---|---|---|---|---|
|
s64 |
[-r,r-1],where r=rank(src). -1 is default |
Optional |
|
When set to True, this module has learnable per-element affine parameters. |
bool |
|
Optional |
|
The constant to improve numerical stability. |
f32 |
Arbitrary positive f32 value, 1e-5 (default) |
Optional |
Execution arguments¶
The inputs and outputs must be provided according to below index order when constructing an operation.
Inputs¶
Index |
Argument Name |
Required or Optional |
---|---|---|
0 |
|
Required |
1 |
|
Required |
2 |
|
Required |
3 |
|
Required |
4 |
|
Optional |
5 |
|
Optional |
Note
gamma
is scaling for normalized value. beta
is the bias added to the scaled normalized value. They are both 1D tensor with the same span as src’s channel axis and required if attribute use_affine
is set to True.
Outputs¶
Index |
Argument Name |
Required or Optional |
---|---|---|
0 |
|
Required |
1 |
|
Optional |
2 |
|
Optional |
Supported data types¶
LayerNormBackward operation supports the following data type combinations.
Src / Diff_dst / Diff_src |
Gamma / Beta / Mean / Variance / Diff_gamma / Diff_beta |
---|---|
f32 |
f32 |
bf16 |
f32, bf16 |
f16 |
f32 |