Principal Components Analysis (PCA)

Principal Component Analysis (PCA) is an algorithm for exploratory data analysis and dimensionality reduction. PCA transforms a set of feature vectors of possibly correlated features to a new set of uncorrelated features, called principal components. Principal components are the directions of the largest variance, that is, the directions where the data is mostly spread out.

Operation

Computational methods

Programming Interface

Training

Covariance

SVD

train(…)

train_input

train_result

Inference

Covariance

SVD

infer(…)

infer_input

infer_result

Partial Training

Covariance

SVD

partial_train(…)

partial_train_input

partial_train_result

Finalize Training

Covariance

SVD

finalize_train(…)

partial_train_result

train_result

Mathematical formulation

Refer to Developer Guide: Principal Components Analysis.

Programming Interface

All types and functions in this section are declared in the oneapi::dal::pca namespace and be available via inclusion of the oneapi/dal/algo/pca.hpp header file.

Enum classes

enum class normalization
normalization::none

No normalization is necessary or data is not normalized.

normalization::mean_center

Just mean centered is necessary, or data is already centered.

normalization::zscore

Normalization is necessary, or data is already normalized.

Descriptor

template<typename Float = float, typename Method = method::by_default, typename Task = task::by_default>
class descriptor
Template Parameters
  • Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

  • Method – Tag-type that specifies an implementation of algorithm. Can be method::cov or method::svd.

  • Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.

Constructors

descriptor(std::int64_t component_count = 0)

Creates a new instance of the class with the given component_count property value.

Public Methods

bool whiten() const
auto &set_whiten(bool value)

Properties

bool deterministic

Specifies whether the algorithm applies the sign-flip technique. If it is true, the directions of the eigenvectors must be deterministic. Default value: true.

Getter & Setter
bool get_deterministic() const
auto & set_deterministic(bool value)
normalization normalization_mode

. Default value: normalization::zscore.

Getter & Setter
normalization get_normalization_mode() const
auto & set_normalization_mode(normalization value)
normalization data_normalization

. Default value: normalization::none.

Getter & Setter
normalization get_data_normalization() const
auto & set_data_normalization(normalization value)
result_option_id result_options

Choose which results should be computed and returned.

Getter & Setter
result_option_id get_result_options() const
auto & set_result_options(const result_option_id &value)
std::int64_t component_count

The number of principal components \(r\). If it is zero, the algorithm computes the eigenvectors for all features, \(r = p\). Default value: 0.

Getter & Setter
std::int64_t get_component_count() const
auto & set_component_count(std::int64_t value)
Invariants

Method tags

struct cov

Tag-type that denotes Covariance computational method.

struct precomputed
struct svd

Tag-type that denotes SVD computational method.

using by_default = cov

Alias tag-type for Covariance computational method.

Task tags

struct dim_reduction

Tag-type that parameterizes entities used for solving dimensionality reduction problem.

using by_default = dim_reduction

Alias tag-type for dimensionality reduction task.

Model

template<typename Task = task::by_default>
class model
Template Parameters

Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.

Constructors

model()

Creates a new instance of the class with the default property values.

Properties

const table &variances

Variances. Default value: table{}.

Getter & Setter
const table & get_variances() const
auto & set_variances(const table &value)
const table &eigenvalues

Eigenvalues. Default value: table{}.

Getter & Setter
const table & get_eigenvalues() const
auto & set_eigenvalues(const table &value)
const table &eigenvectors

An \(r \times p\) table with the eigenvectors. Each row contains one eigenvector. Default value: table{}.

Getter & Setter
const table & get_eigenvectors() const
auto & set_eigenvectors(const table &value)
const table &means

Means. Default value: table{}.

Getter & Setter
const table & get_means() const
auto & set_means(const table &value)

Training train(...)

Input

template<typename Task = task::by_default>
class train_input
Template Parameters

Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.

Constructors

train_input()
train_input(const table &data)

Creates a new instance of the class with the given data property value.

Properties

const table &data

An \(n \times p\) table with the training data, where each row stores one feature vector. Default value: table{}.

Getter & Setter
const table & get_data() const
auto & set_data(const table &data)

Result and Finalize Result

template<typename Task = task::by_default>
class train_result
Template Parameters

Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.

Constructors

train_result()

Creates a new instance of the class with the default property values.

Properties

const table &eigenvectors

An \(r \times p\) table with the eigenvectors. Each row contains one eigenvector. Default value: table{}.

Getter & Setter
const table & get_eigenvectors() const
auto & set_eigenvectors(const table &value)
Invariants
eigenvectors == model.eigenvectors
const table &means

A \(1 \times r\) table that contains the mean values for the first r features. Default value: table{}.

Getter & Setter
const table & get_means() const
auto & set_means(const table &value)
const table &singular_values

A \(1 \times r\) table that contains the singular values for the first r features. Default value: table{}.

Getter & Setter
const table & get_singular_values() const
auto & set_singular_values(const table &value)
const table &eigenvalues

A \(1 \times r\) table that contains the eigenvalues for for the first r features. Default value: table{}.

Getter & Setter
const table & get_eigenvalues() const
auto & set_eigenvalues(const table &value)
const model<Task> &model

The trained PCA model. Default value: model<Task>{}.

Getter & Setter
const model< Task > & get_model() const
auto & set_model(const model< Task > &value)
const table &variances

A \(1 \times r\) table that contains the variances for the first r features. Default value: table{}.

Getter & Setter
const table & get_variances() const
auto & set_variances(const table &value)
const result_option_id &result_options

Result options that indicates availability of the properties. Default value: default_result_options<Task>.

Getter & Setter
const result_option_id & get_result_options() const
auto & set_result_options(const result_option_id &value)
const table &explained_variances_ratio

A \(1 \times r\) table that contains the explained variances values for the first r features. Default value: table{}.

Getter & Setter
const table & get_explained_variances_ratio() const
auto & set_explained_variances_ratio(const table &value)

Operation

template<typename Descriptor>
pca::train_result train(const Descriptor &desc, const pca::train_input &input)
Parameters
  • desc – PCA algorithm descriptor pca::descriptor

  • input – Input data for the training operation

Preconditions
input.data.has_data == true
input.data.column_count >= desc.component_count
Postconditions
result.means.row_count == 1
result.means.column_count == desc.component_count
result.variances.row_count == 1
result.variances.column_count == desc.component_count
result.variances[i] >= 0.0
result.eigenvalues.row_count == 1
result.eigenvalues.column_count == desc.component_count
result.model.eigenvectors.row_count == 1
result.model.eigenvectors.column_count == desc.component_count

Partial Training

Partial Input

template<typename Task = task::by_default>
class partial_train_input

Constructors

partial_train_input()
partial_train_input(const table &data)
partial_train_input(const partial_train_result<Task> &prev, const table &data)

Properties

const table &data
Getter & Setter
const table & get_data() const
auto & set_data(const table &value)
const partial_train_result<Task> &prev
Getter & Setter
const partial_train_result< Task > & get_prev() const
auto & set_prev(const partial_train_result< Task > &value)

Partial Result and Finalize Input

template<typename Task = task::by_default>
class partial_train_result

Constructors

partial_train_result()

Public Methods

std::int64_t get_auxiliary_table_count() const

Properties

const table &partial_sum

Sums. Default value: table{}.

Getter & Setter
const table & get_partial_sum() const
auto & set_partial_sum(const table &value)
const table &auxiliary_table
Getter & Setter
const table & get_auxiliary_table(const std::int64_t) const
auto & set_auxiliary_table(const table &value)
const table &partial_crossproduct

The crossproduct matrix. Default value: table{}.

Getter & Setter
const table & get_partial_crossproduct() const
auto & set_partial_crossproduct(const table &value)
const table &partial_n_rows

The nobs value. Default value: table{}.

Getter & Setter
const table & get_partial_n_rows() const
auto & set_partial_n_rows(const table &value)

Finalize Training

Inference infer(...)

Input

template<typename Task = task::by_default>
class infer_input
Template Parameters

Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.

Constructors

infer_input(const model<Task> &trained_model, const table &data)

Creates a new instance of the class with the given model and data property values.

Properties

const model<Task> &model

The trained PCA model. Default value: model<Task>{}.

Getter & Setter
const model< Task > & get_model() const
auto & set_model(const model< Task > &value)
const table &data

The dataset for inference \(X'\). Default value: table{}.

Getter & Setter
const table & get_data() const
auto & set_data(const table &value)

Result

template<typename Task = task::by_default>
class infer_result
Template Parameters

Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.

Constructors

infer_result()

Creates a new instance of the class with the default property values.

Properties

const table &transformed_data

An \(n \times r\) table that contains data projected to the r principal components. Default value: table{}.

Getter & Setter
const table & get_transformed_data() const
auto & set_transformed_data(const table &value)

Operation

template<typename Descriptor>
pca::infer_result infer(const Descriptor &desc, const pca::infer_input &input)
Parameters
  • desc – PCA algorithm descriptor pca::descriptor

  • input – Input data for the inference operation

Preconditions
input.data.has_data == true
input.model.eigenvectors.row_count == desc.component_count
input.model.eigenvectors.column_count == input.data.column_count
Postconditions
result.transformed_data.row_count == input.data.row_count
result.transformed_data.column_count == desc.component_count

Usage Example

Training

pca::model<> run_training(const table& data) {
   const auto pca_desc = pca::descriptor<float>{}
      .set_component_count(5)
      .set_deterministic(true);

   const auto result = train(pca_desc, data);

   print_table("means", result.get_means());
   print_table("variances", result.get_variances());
   print_table("eigenvalues", result.get_eigenvalues());
   print_table("eigenvectors", result.get_eigenvectors());

   return result.get_model();
}

Inference

table run_inference(const pca::model<>& model,
                  const table& new_data) {
   const auto pca_desc = pca::descriptor<float>{}
      .set_component_count(model.get_component_count());

   const auto result = infer(pca_desc, model, new_data);

   print_table("labels", result.get_transformed_data());
}

Examples

Batch Processing:

Online Processing: