Z-score#

Z-score normalization is an algorithm that produces data with each feature (column) having zero mean and unit variance.

Details#

Given a set \(X\) of \(n\) feature vectors \(x_1 = (x_{11}, \ldots, x_{1p}), \ldots, x_n = (x_{n1}, \ldots, x_{np})\) of dimension \(p\), the problem is to compute the matrix \(Y = (y_{ij})\) of dimension \(n \times p\) as following:

\[y_{ij} = \frac {x_{ij} - m_j} {\Delta}\]

where:

  • \(m_j\) is the mean of \(j\)-th component of set \((X)_j\), where \(j = \overline{1, p}\)

  • value of \(\Delta\) depends omn a computation mode

oneDAL provides two modes for computing the result matrix. You can enable the mode by setting the flag doScale to a certain position (for details, see Algorithm Parameters). The mode may include:

  • Centering only. In this case, \(\Delta = 1\) and no scaling is performed. After normalization, the mean of \(j\)-th component of result set \((Y)_j\) will be zero.

  • Centering and scaling. In this case, \(\Delta = \sigma_j\), where \(\sigma_j\) is the standard deviation of \(j\)-th component of set \((X)_j\). After normalization, the mean of \(j\)-th component of result set \((Y)_j\) will be zero and its variance will get a value of one.

Note

Some algorithms require normalization parameters (mean and variance) as an input. The implementation of Z-score algorithm in oneDAL does not return these values by default. Enable this option by setting the resultsToCompute flag. For details, see Algorithm Parameters.

Batch Processing#

Algorithm Input#

Z-score normalization algorithm accepts an input as described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.

Algorithm Input for Z-score (Batch Processing)#

Input ID

Input

data

Pointer to the numeric table of size \(n \times p\).

Note

This table can be an object of any class derived from NumericTable.

Algorithm Parameters#

Z-score normalization algorithm has the following parameters. Some of them are required only for specific values of the computation method parameter method:

Algorithm Parameters for Z-score (Batch Processing)#

Parameter

method

Default Value

Description

algorithmFPType

defaultDense or sumDense

float

The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

method

Not applicable

defaultDense

Available computation methods:

defaultDense

a performance-oriented method. Mean and variance are computed by low order moments algorithm. For details, see Batch Processing for Moments of Low Order.

sumDense

a method that uses the basic statistics associated with the numeric table of pre-computed sums. Returns an error if pre-computed sums are not defined.

moments

defaultDense

SharedPtr<low_order_moments::Batch<algorithmFPType, low_order_moments::defaultDense> >

Pointer to the low order moments algorithm that computes means and standard deviations to be used for Z-score normalization with the defaultDense method.

doScale

defaultDense or sumDense

true

If true, the algorithm applies both centering and scaling. Otherwise, the algorithm provides only centering.

resultsToCompute

defaultDense or sumDense

Not applicable

Optional.

Pointer to the data collection containing the following key-value pairs for Z-score:

  • mean - means

  • variance - variances

Provide one of these values to request a single characteristic or use bitwise OR to request a combination of them.

Algorithm Output#

Z-score normalization algorithm calculates the result as described below. Pass the Result ID as a parameter to the methods that access the results of your algorithm. For more details, see Algorithms.

Algorithm Output for Z-score (Batch Processing)#

Result ID

Result

normalizedData

Pointer to the \(n \times p\) numeric table that stores the result of normalization.

Note

By default, the result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

means

Optional.

Pointer to the \(1 \times p\) numeric table that contains mean values for each feature.

If the function result is not requested through the resultsToCompute parameter, the numeric table contains a NULL pointer.

variances

Optional.

Pointer to the \(1 \times p\) numeric table that contains variance values for each feature.

If the function result is not requested through the resultsToCompute parameter, the numeric table contains a NULL pointer. -

Note

By default, each numeric table specified by the collection elements is an object of the HomogenNumericTable class. You can also define the result as an object of any class derived from NumericTable, except for PackedSymmetricMatrix, PackedTriangularMatrix, and CSRNumericTable.

Examples#

Batch Processing: