Quality Metrics for Principal Components Analysis¶
Given the results of the PCA algorithm, data set \(E = (e_i)\), \(i = \overline{1, p}\) of eigenvalues in decreasing order, full number of principal components \(p\) and reduced number of components \(p_r \leq p\), the problem is to evaluate the explained variances radio and noise variance.
QualityMetricsId
for the PCA algorithm is explainedVarianceMetrics
.
Details¶
The metrics are computed given the input data meets the following requirements:
At least the largest eigenvalue \(e_0\) is non-zero. Returns an error otherwise.
The number of eigenvalues \(p\) must be equal to the number of features provided. Returns an error if \(p\) is less than the number of features.
The PCA algorithm receives input argument eigenvalues \(e_k\), \(k = \overline{1, p}\). It represents the following quality metrics:
Explained variance ratio
Noise variance
The library uses the following quality metrics:
Quality Metric |
Definition |
---|---|
Explained variance |
\(e_k\), \(k = \overline{1, p}\) |
Explained variance ratios |
\(r_k = \frac {e_k}{\sum _{i = 1}^{p} e_i}\), \(k = \overline{1, p}\) |
Noise variance |
\[\begin{split}v_\text{noise} =
\begin{cases}
0, & p_r = p;\\
\frac{1}{p - p_r} \sum _{i = p_r + 1}^{p} e_i, & p_r < p
\end{cases}\end{split}\]
|
Note
Quality metrics for PCA are correctly calculated only if the eigenvalues vector obtained from the PCA algorithm has not been reduced. That is, the nComponents parameter of the PCA algorithm must be zero or equal to the number of features. The formulas rely on a full set of the principal components. If the set is reduced, the result is considered incorrect.
Batch Processing¶
Algorithm Input¶
The Quality Metrics for PCA algorithm accepts the input described below.
Pass the Input ID
as a parameter to the methods that provide input for your algorithm.
For more details, see Algorithms.
Input ID |
Input |
---|---|
|
\(p\) eigenvalues (explained variances), numeric table of size \(1 \times p\). You can define it as an object of any class derived from |
Algorithm Parameters¶
The quality metric algorithm has the following parameters:
Parameter |
Default Value |
Description |
---|---|---|
|
|
The floating-point type that the algorithm uses for intermediate computations. Can be |
|
\(0\) |
The number of principal components \(p_r \leq p\) to compute metrics for. If it is zero, the algorithm will compute the result for \(p\). |
|
\(0\) |
The number of features in the data set used as input in PCA algorithm. If it is zero, the algorithm will compute the result for p. Note if \(\text{nFeatures} \neq p\), the algorithm will return non-relevant results. |
Algorithm Output¶
The quality metric for PCA algorithm calculates the result described below.
Pass the Result ID
as a parameter to the methods that access the results of your algorithm.
Result ID |
Result |
---|---|
|
Pointer to the \(1 \times p_r\) numeric table that contains a reduced eigenvalues array. |
|
Pointer to the \(1 \times p_r\) numeric table that contains an array of reduced explained variances ratios. |
|
Pointer to the \(1 \times 1\) numeric table that contains noise variance. |
Note
By default, each numeric table specified by the collection elements is an object of the HomogenNumericTable
class,
but you can define the result as an object of any class derived from NumericTable
, except for PackedSymmetricMatrix
, PackedTriangularMatrix
, and CSRNumericTable.
Examples¶
Batch Processing: