Quality Metrics for Principal Components Analysis

Given the results of the PCA algorithm, data set \(E = (e_i)\), \(i = \overline{1, p}\) of eigenvalues in decreasing order, full number of principal components \(p\) and reduced number of components \(p_r \leq p\), the problem is to evaluate the explained variances radio and noise variance.

QualityMetricsId for the PCA algorithm is explainedVarianceMetrics.

Details

The metrics are computed given the input data meets the following requirements:

  • At least the largest eigenvalue \(e_0\) is non-zero. Returns an error otherwise.

  • The number of eigenvalues \(p\) must be equal to the number of features provided. Returns an error if \(p\) is less than the number of features.

The PCA algorithm receives input argument eigenvalues \(e_k\), \(k = \overline{1, p}\). It represents the following quality metrics:

  • Explained variance ratio

  • Noise variance

The library uses the following quality metrics:

Quality Metrics for Principal Components Analysis

Quality Metric

Definition

Explained variance

\(e_k\), \(k = \overline{1, p}\)

Explained variance ratios

\(r_k = \frac {e_k}{\sum _{i = 1}^{p} e_i}\), \(k = \overline{1, p}\)

Noise variance

\[\begin{split}v_\text{noise} = \begin{cases} 0, & p_r = p;\\ \frac{1}{p - p_r} \sum _{i = p_r + 1}^{p} e_i, & p_r < p \end{cases}\end{split}\]

Note

Quality metrics for PCA are correctly calculated only if the eigenvalues vector obtained from the PCA algorithm has not been reduced. That is, the nComponents parameter of the PCA algorithm must be zero or equal to the number of features. The formulas rely on a full set of the principal components. If the set is reduced, the result is considered incorrect.

Batch Processing

Algorithm Input

The Quality Metrics for PCA algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.

Algorithm Input for Quality Metrics for Principal Components Analysis (Batch Processing)

Input ID

Input

eigenvalues

\(p\) eigenvalues (explained variances), numeric table of size \(1 \times p\).

You can define it as an object of any class derived from NumericTable except PackedSymmetricMatrix, PackedTriangularMatrix, and CSRNumericTable.

Algorithm Parameters

The quality metric algorithm has the following parameters:

Algorithm Parameters for Quality Metrics for Principal Components Analysis (Batch Processing)

Parameter

Default Value

Description

algorithmFPType

float

The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

nComponents

\(0\)

The number of principal components \(p_r \leq p\) to compute metrics for. If it is zero, the algorithm will compute the result for \(p\).

nFeatures

\(0\)

The number of features in the data set used as input in PCA algorithm. If it is zero, the algorithm will compute the result for p.

Note

if \(\text{nFeatures} \neq p\), the algorithm will return non-relevant results.

Algorithm Output

The quality metric for PCA algorithm calculates the result described below. Pass the Result ID as a parameter to the methods that access the results of your algorithm.

Algorithm Output for Quality Metrics for Principal Components Analysis (Batch Processing)

Result ID

Result

explainedVariances

Pointer to the \(1 \times p_r\) numeric table that contains a reduced eigenvalues array.

explainedVariancesRatios

Pointer to the \(1 \times p_r\) numeric table that contains an array of reduced explained variances ratios.

noiseVariance

Pointer to the \(1 \times 1\) numeric table that contains noise variance.

Note

By default, each numeric table specified by the collection elements is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable, except for PackedSymmetricMatrix, PackedTriangularMatrix, and CSRNumericTable.

Examples

Batch Processing: