Z-score¶
Z-score normalization is an algorithm that produces data with each feature (column) having zero mean and unit variance.
Details¶
Given a set \(X\) of \(n\) feature vectors \(x_1 = (x_{11}, \ldots, x_{1p}), \ldots, x_n = (x_{n1}, \ldots, x_{np})\) of dimension \(p\), the problem is to compute the matrix \(Y = (y_{ij})\) of dimension \(n \times p\) as following:
where:
\(m_j\) is the mean of \(j\)-th component of set \((X)_j\), where \(j = \overline{1, p}\)
value of \(\Delta\) depends omn a computation mode
oneDAL provides two modes for computing the result matrix.
You can enable the mode by setting the flag doScale
to a certain position (for details, see Algorithm Parameters).
The mode may include:
Centering only. In this case, \(\Delta = 1\) and no scaling is performed. After normalization, the mean of \(j\)-th component of result set \((Y)_j\) will be zero.
Centering and scaling. In this case, \(\Delta = \sigma_j\), where \(\sigma_j\) is the standard deviation of \(j\)-th component of set \((X)_j\). After normalization, the mean of \(j\)-th component of result set \((Y)_j\) will be zero and its variance will get a value of one.
Note
Some algorithms require normalization parameters (mean and variance) as an input. The implementation of Z-score algorithm in oneDAL does not return these values by default. Enable this option by setting the resultsToCompute flag. For details, see Algorithm Parameters.
Batch Processing¶
Algorithm Input¶
Z-score normalization algorithm accepts an input as described below.
Pass the Input ID
as a parameter to the methods that provide input for your algorithm.
For more details, see Algorithms.
Input ID |
Input |
---|---|
|
Pointer to the numeric table of size \(n \times p\). Note This table can be an object of any class derived from |
Algorithm Parameters¶
Z-score normalization algorithm has the following parameters.
Some of them are required only for specific values of the computation method parameter method
:
Parameter |
method |
Default Value |
Description |
---|---|---|---|
|
|
|
The floating-point type that the algorithm uses for intermediate computations. Can be |
|
Not applicable |
|
Available computation methods:
|
|
|
SharedPtr<low_order_moments::Batch<algorithmFPType, low_order_moments::defaultDense> > |
Pointer to the low order moments algorithm that computes means and standard deviations
to be used for Z-score normalization with the |
|
|
|
If true, the algorithm applies both centering and scaling. Otherwise, the algorithm provides only centering. |
|
|
Not applicable |
Optional. Pointer to the data collection containing the following key-value pairs for Z-score:
Provide one of these values to request a single characteristic or use bitwise OR to request a combination of them. |
Algorithm Output¶
Z-score normalization algorithm calculates the result as described below.
Pass the Result ID
as a parameter to the methods that access the results of your algorithm.
For more details, see Algorithms.
Result ID |
Result |
---|---|
|
Pointer to the \(n \times p\) numeric table that stores the result of normalization. Note By default, the result is an object of the |
|
Optional. Pointer to the \(1 \times p\) numeric table that contains mean values for each feature. If the function result is not requested through the |
|
Optional. Pointer to the \(1 \times p\) numeric table that contains variance values for each feature. If the function result is not requested through the |
Note
By default, each numeric table specified by the collection elements is an object of the HomogenNumericTable
class.
You can also define the result as an object of any class derived from NumericTable
,
except for PackedSymmetricMatrix
, PackedTriangularMatrix
, and CSRNumericTable
.
Examples¶
Batch Processing: