Principal Components Analysis (PCA)#

Principal Component Analysis (PCA) is an algorithm for exploratory data analysis and dimensionality reduction. PCA transforms a set of feature vectors of possibly correlated features to a new set of uncorrelated features, called principal components. Principal components are the directions of the largest variance, that is, the directions where the data is mostly spread out.

Operation

Computational methods

Programming Interface

Training

Covariance

SVD

train(…)

train_input

train_result

Inference

Covariance

SVD

infer(…)

infer_input

infer_result

Mathematical formulation#

Training#

Given a training data set \(X = \{ x_1, \ldots, x_n \}\) with \(n\) feature vectors of dimension \(p\), the problem is to compute \(r\) principal directions (\(p\)-dimensional eigenvectors [Lang87]) of the training date set. The eigenvectors can be grouped into the \(r \times p\) matrix \(T\) that contains one eigenvector in each row.

The principal components can be computed with any of the following two methods:

  1. Covariance (or Correlation)

  2. Singular Value Decomposition (SVD)

Training method: Covariance#

The PCA algorithm can be trained using either the covariance or the correlation matrix. The choice of covariance matrix or correlation matrix is application-dependent. More specifically, if scaling of the features is important for a problem, which is often the case, using the correlation matrix to compute principal components is more appropriate. By default, oneDAL uses the correlation matrix to compute the principal components. It is possible to use the covariance matrix by passing “precomputed” as method and feeding a covariance matrix as input to the PCA algorithm. To compute the covariance matrix, the Covariance algorithm can be used.

The eigenvector associated with the \(k\)-th largest eigenvalue of the covariance (or correlation) matrix is also the \(k\)-th principal component of the training data set. Based on this, the principal components can be computed in three steps:

  1. Computation of the covariance (or correlation) matrix.

  2. Computation of the eigenvectors and eigenvalues of the covariance (or correlation) matrix.

  3. Processing (sorting and storing) the results.

Covariance matrix can be computed in the following way:

  1. Compute the column means \(M = \{ m(1), \ldots , m(p) \}\), where \(m\left(j\right)=\frac{1}{n}\sum _{i}{x}_{ij}\).

  2. Compute the sample covariance matrix \(S = \{ s_{ij} \}\), where \(s_{ij}=\frac{1}{n-1}\sum_{k=1}^{n}(x_{ki}-m(i))(x_{kj}-m(j))\), \(i=\overline{1,p}\), \(j=\overline{1,p}\).

Corelation matrix can be computed from covariance matrix in the following way:

  1. Compute the correlation matrix \(C = \{ c_{ij} \}\), where \(c_{ij}=\frac{s_{ij}}{\sqrt{s_{ii}\cdot s_{jj}}}\), \(i=\overline{1,p}\), \(j=\overline{1,p}\).

The eigenvalues \(\lambda_k\) and eigenvectors \(\upsilon_k\) can be computed by an arbitrary method such as [Ping14].

In the final step, the eigenvalues (\(\lambda_k\)) are sorted in descending order to determine the order of the principal components. Each principal component is stored as a row of the final resulting matrix, \(T = \{ \upsilon_{1}, \cdots, \upsilon_{r} \}\), where \(\upsilon_{i}\) is the \(i\)-th principal component of dimension \(p\). Additionally, the means and variances of the input dataset are returned.

Training method: SVD#

The singular value decomposition (SVD) is a matrix factorization technique that decomposes an observation matrix \(X = \{ x_1, \ldots, x_n \}\) of \(p\)-dimensional feature vectors into three matrices as \(X = U \Sigma V^*\). Here:

  1. The columns of \(U\) are the left-singular vectors.

  2. The columns of \(V\) are the right-singular vectors.

  3. \(V^*\) is the conjugate transpose of the matrix \(V\).

  4. The diagonal entries of \(\Sigma\) are the singular values (\(\sigma\)) of \(X\).

The right-singular vectors are the principal components of \(X\). The steps of computing principal components using the SVD technique are:

  1. Mean centering the input data.

  2. Decomposing the mean-centered input data to compute the singular values and the singular vectors.

  3. Processing (sorting and storing) the results.

First step is to mean center the input data \(X^{c} = \{ x^{c}_{ij} \}\), where \(x^{c}_{ij} = x_{ij} - m(j)\), \(i=\overline{1,n}\), \(j=\overline{1,p}\), \(m(j) = \frac{1}{n}\sum _{i}{x}_{ij}\).

Singular values \(\sigma_k\), left-singular vectors \(U_k\), and right-singular vectors \(V_k\) of matrix \(X^{c}\) can be computed with an arbitrary method as described in [Demmel90].

The final step is to find a permutation matrix \(Q_{p \times p}\) such that the diagonal entries of \(\Sigma Q\) are sorted in descending order, i.e. \(\sigma_{k} \geq \sigma_{k+1}\), for all \(k < p\) assuming \(n > p\). The rows of the resulting matrix \(T = V^{*} Q\) are the principal components of \(X\). The rows of \(T\) are also the eigenvectors of the covariance matrix of \(X\). Additionally, the means and variances of the initial dataset are returned.

Sign-flip technique#

The eigenvectors (or the right-singular vectors) are not uniquely defined because the negative of any eigenvector is also an eigenvector of the input matrix. The signs of the eigenvectors or the singular vectors often depend on the solver used. A sign-flip technique, such as the one proposed in [Bro07], helps remove the ambiguity. The sign-flip function modifies the matrix \(T\) the following way:

\[\hat{T}_i = T_i \cdot \mathrm{sgn}(\max_{1 \leq j \leq p } |{T}_{ij}|), \quad 1 \leq i \leq r,\]

where \(T_i\) is \(i\)-th row, \(T_{ij}\) is the element in the \(i\)-th row and \(j\)-th column, \(\mathrm{sgn}(\cdot)\) is the signum function,

\[\begin{split}\mathrm{sgn}(x) = \begin{cases} -1, & x < 0, \\ 0, & x = 0, \\ 1, & x > 0. \end{cases}\end{split}\]

Inference#

Given the inference data set \(X^{'} = \{ x^{'}_1, \ldots, x^{'}_m \}\) with \(m\) feature vectors of dimension \(p\), and the \(r \times p\) transformation matrix \(T\) produced at the training stage, the problem is to transform \(X^{'}\) to the \(m \times r\) matrix \(X^{''} = \{ x^{''}_1, \ldots, x^{''}_m \}\), where \(x^{''}_{i}\) is an \(r\)-dimensional transformed observation.

Each individual observation \(x^{'}_{i}\) can be transformed by applying the following linear transformation [Lang87] defined by the matrix \(T\),

(1)#\[x^{''}_{i} = x^{'}_{i} T^{*}, \quad 1 \leq i \leq m, \quad T^{*} = transpose(T).\]

Inference methods: Covariance and SVD#

Covariance and SVD inference methods compute \(x_{j}''\) according to (1).

Programming Interface#

Refer to API Reference: Principal Components Analysis.

Distributed mode#

The algorithm supports distributed execution in SMPD mode (only on GPU).

Usage Example#

Training#

pca::model<> run_training(const table& data) {
   const auto pca_desc = pca::descriptor<float>{}
      .set_component_count(5)
      .set_deterministic(true);

   const auto result = train(pca_desc, data);

   print_table("means", result.get_means());
   print_table("variances", result.get_variances());
   print_table("eigenvalues", result.get_eigenvalues());
   print_table("eigenvectors", result.get_eigenvectors());

   return result.get_model();
}

Inference#

table run_inference(const pca::model<>& model,
                  const table& new_data) {
   const auto pca_desc = pca::descriptor<float>{}
      .set_component_count(model.get_component_count());

   const auto result = infer(pca_desc, model, new_data);

   print_table("labels", result.get_transformed_data());
}

Examples#

Batch Processing: