Principal Components Analysis (PCA)¶
Principal Component Analysis (PCA) is an algorithm for exploratory data analysis and dimensionality reduction. PCA transforms a set of feature vectors of possibly correlated features to a new set of uncorrelated features, called principal components. Principal components are the directions of the largest variance, that is, the directions where the data is mostly spread out.
Operation |
Computational methods |
Programming Interface |
|||
Mathematical formulation¶
Programming Interface¶
All types and functions in this section are declared in the
oneapi::dal::pca
namespace and be available via inclusion of the
oneapi/dal/algo/pca.hpp
header file.
Enum classes¶
-
enum class normalization¶
- normalization::none
No normalization is necessary or data is not normalized.
- normalization::mean_center
Just mean centered is necessary, or data is already centered.
- normalization::zscore
Normalization is necessary, or data is already normalized.
Descriptor¶
-
template<typename Float = float, typename Method = method::by_default, typename Task = task::by_default>
class descriptor¶ - Template Parameters
Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.
Method – Tag-type that specifies an implementation of algorithm. Can be method::cov or method::svd.
Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.
Constructors
-
descriptor(std::int64_t component_count = 0)¶
Creates a new instance of the class with the given
component_count
property value.
Public Methods
-
bool whiten() const¶
-
auto &set_whiten(bool value)¶
Properties
-
bool deterministic¶
Specifies whether the algorithm applies the sign-flip technique. If it is true, the directions of the eigenvectors must be deterministic. Default value: true.
- Getter & Setter
bool get_deterministic() const
auto & set_deterministic(bool value)
-
normalization normalization_mode¶
. Default value: normalization::zscore.
- Getter & Setter
normalization get_normalization_mode() const
auto & set_normalization_mode(normalization value)
-
normalization data_normalization¶
. Default value: normalization::none.
- Getter & Setter
normalization get_data_normalization() const
auto & set_data_normalization(normalization value)
-
result_option_id result_options¶
Choose which results should be computed and returned.
- Getter & Setter
result_option_id get_result_options() const
auto & set_result_options(const result_option_id &value)
-
std::int64_t component_count¶
The number of principal components \(r\). If it is zero, the algorithm computes the eigenvectors for all features, \(r = p\). Default value: 0.
- Getter & Setter
std::int64_t get_component_count() const
auto & set_component_count(std::int64_t value)
- Invariants
- component_count >= 0
Model¶
-
template<typename Task = task::by_default>
class model¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.
Constructors
-
model()¶
Creates a new instance of the class with the default property values.
Properties
-
const table &variances¶
Variances. Default value: table{}.
- Getter & Setter
const table & get_variances() const
auto & set_variances(const table &value)
-
const table &eigenvalues¶
Eigenvalues. Default value: table{}.
- Getter & Setter
const table & get_eigenvalues() const
auto & set_eigenvalues(const table &value)
Training train(...)¶
Input¶
-
template<typename Task = task::by_default>
class train_input¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.
Constructors
-
train_input()¶
-
train_input(const table &data)¶
Creates a new instance of the class with the given
data
property value.
Properties
Result and Finalize Result¶
-
template<typename Task = task::by_default>
class train_result¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.
Constructors
-
train_result()¶
Creates a new instance of the class with the default property values.
Properties
-
const table &eigenvectors¶
An \(r \times p\) table with the eigenvectors. Each row contains one eigenvector. Default value: table{}.
- Getter & Setter
const table & get_eigenvectors() const
auto & set_eigenvectors(const table &value)
- Invariants
- eigenvectors == model.eigenvectors
-
const table &means¶
A \(1 \times r\) table that contains the mean values for the first
r
features. Default value: table{}.- Getter & Setter
const table & get_means() const
auto & set_means(const table &value)
-
const table &singular_values¶
A \(1 \times r\) table that contains the singular values for the first
r
features. Default value: table{}.- Getter & Setter
const table & get_singular_values() const
auto & set_singular_values(const table &value)
-
const table &eigenvalues¶
A \(1 \times r\) table that contains the eigenvalues for for the first
r
features. Default value: table{}.- Getter & Setter
const table & get_eigenvalues() const
auto & set_eigenvalues(const table &value)
-
const model<Task> &model¶
The trained PCA model. Default value: model<Task>{}.
- Getter & Setter
const model< Task > & get_model() const
auto & set_model(const model< Task > &value)
-
const table &variances¶
A \(1 \times r\) table that contains the variances for the first
r
features. Default value: table{}.- Getter & Setter
const table & get_variances() const
auto & set_variances(const table &value)
-
const result_option_id &result_options¶
Result options that indicates availability of the properties. Default value: default_result_options<Task>.
- Getter & Setter
const result_option_id & get_result_options() const
auto & set_result_options(const result_option_id &value)
Operation¶
-
template<typename Descriptor>
pca::train_result train(const Descriptor &desc, const pca::train_input &input)¶ - Parameters
desc – PCA algorithm descriptor pca::descriptor
input – Input data for the training operation
- Preconditions
- Postconditions
- result.means.row_count == 1result.means.column_count == desc.component_countresult.variances.row_count == 1result.variances.column_count == desc.component_countresult.variances[i] >= 0.0result.eigenvalues.row_count == 1result.eigenvalues.column_count == desc.component_countresult.model.eigenvectors.row_count == 1result.model.eigenvectors.column_count == desc.component_count
Partial Training¶
Partial Input¶
-
template<typename Task = task::by_default>
class partial_train_input¶ Constructors
-
partial_train_input()¶
-
partial_train_input(const partial_train_result<Task> &prev, const table &data)¶
Properties
-
const table &data¶
- Getter & Setter
const table & get_data() const
auto & set_data(const table &value)
-
const partial_train_result<Task> &prev¶
- Getter & Setter
const partial_train_result< Task > & get_prev() const
auto & set_prev(const partial_train_result< Task > &value)
-
partial_train_input()¶
Partial Result and Finalize Input¶
-
template<typename Task = task::by_default>
class partial_train_result¶ Constructors
-
partial_train_result()¶
Public Methods
-
std::int64_t get_auxiliary_table_count() const¶
Properties
-
const table &partial_sum¶
Sums. Default value: table{}.
- Getter & Setter
const table & get_partial_sum() const
auto & set_partial_sum(const table &value)
-
const table &auxiliary_table¶
- Getter & Setter
const table & get_auxiliary_table(const std::int64_t) const
auto & set_auxiliary_table(const table &value)
-
partial_train_result()¶
Finalize Training¶
Inference infer(...)¶
Input¶
-
template<typename Task = task::by_default>
class infer_input¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.
Constructors
-
infer_input(const model<Task> &trained_model, const table &data)¶
Creates a new instance of the class with the given
model
anddata
property values.
Properties
Result¶
-
template<typename Task = task::by_default>
class infer_result¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.
Constructors
-
infer_result()¶
Creates a new instance of the class with the default property values.
Properties
Operation¶
-
template<typename Descriptor>
pca::infer_result infer(const Descriptor &desc, const pca::infer_input &input)¶ - Parameters
desc – PCA algorithm descriptor pca::descriptor
input – Input data for the inference operation
- Preconditions
- Postconditions
Usage Example¶
Training¶
pca::model<> run_training(const table& data) {
const auto pca_desc = pca::descriptor<float>{}
.set_component_count(5)
.set_deterministic(true);
const auto result = train(pca_desc, data);
print_table("means", result.get_means());
print_table("variances", result.get_variances());
print_table("eigenvalues", result.get_eigenvalues());
print_table("eigenvectors", result.get_eigenvectors());
return result.get_model();
}
Inference¶
table run_inference(const pca::model<>& model,
const table& new_data) {
const auto pca_desc = pca::descriptor<float>{}
.set_component_count(model.get_component_count());
const auto result = infer(pca_desc, model, new_data);
print_table("labels", result.get_transformed_data());
}