K-Means¶
The K-Means algorithm solves clustering problem by partitioning \(n\) feature vectors into \(k\) clusters minimizing some criterion. Each cluster is characterized by a representative point, called a centroid.
Operation |
Computational methods |
Programming Interface |
||
Mathematical formulation¶
Refer to Developer Guide: K-Means.
Programming Interface¶
All types and functions in this section are declared in the
oneapi::dal::kmeans
namespace and be available via inclusion of the
oneapi/dal/algo/kmeans.hpp
header file.
Descriptor¶
-
template<typename Float = float, typename Method = method::by_default, typename Task = task::by_default>
class descriptor¶ - Template Parameters
Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.
Method – Tag-type that specifies an implementation of algorithm. Can be method::lloyd_dense.
Task – Tag-type that specifies the type of the problem to solve. Can be task::clustering.
Constructors
-
descriptor(std::int64_t cluster_count = 2)¶
Creates a new instance of the class with the given
cluster_count
.
Properties
-
double accuracy_threshold¶
The threshold \(\varepsilon\) for the stop condition. Default value: 0.0.
- Getter & Setter
double get_accuracy_threshold() const
auto & set_accuracy_threshold(double value)
- Invariants
- accuracy_threshold >= 0.0
-
std::int64_t max_iteration_count¶
The maximum number of iterations
T
. Default value: 100.- Getter & Setter
std::int64_t get_max_iteration_count() const
auto & set_max_iteration_count(std::int64_t value)
- Invariants
- max_iteration_count >= 0
-
result_option_id result_options¶
Choose which results should be computed and returned.
- Getter & Setter
result_option_id get_result_options() const
auto & set_result_options(const result_option_id &value)
-
std::int64_t cluster_count¶
The number of clusters k. Default value: 2.
- Getter & Setter
std::int64_t get_cluster_count() const
auto & set_cluster_count(std::int64_t value)
- Invariants
- cluster_count > 0
Model¶
-
template<typename Task = task::by_default>
class model¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be task::clustering.
Constructors
-
model()¶
Creates a new instance of the class with the default property values.
Public Methods
-
std::int64_t get_cluster_count() const¶
Number of clusters k in the trained model.
Properties
Training train(...)¶
Input¶
-
template<typename Task = task::by_default>
class train_input¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be task::clustering.
Constructors
-
train_input(const table &data, const table &initial_centroids)¶
Creates a new instance of the class with the given
data
andinitial_centroids
.
Properties
Result¶
-
template<typename Task = task::by_default>
class train_result¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be task::clustering.
Constructors
-
train_result()¶
Creates a new instance of the class with the default property values.
Properties
-
const table &responses¶
An \(n \times 1\) table with the responses \(y_i\) assigned to the samples \(x_i\) in the input data, \(1 \leq 1 \leq n\). Default value: table{}.
- Getter & Setter
const table & get_responses() const
auto & set_responses(const table &value)
-
const table &labels¶
An \(n \times 1\) table with the labels \(y_i\) assigned to the samples \(x_i\) in the input data, \(1 \leq 1 \leq n\). Default value: table{}.
- Getter & Setter
const table & get_labels() const
auto & set_labels(const table &value)
-
const model<Task> &model¶
The trained K-means model. Default value: model<Task>{}.
- Getter & Setter
const model< Task > & get_model() const
auto & set_model(const model< Task > &value)
-
double objective_function_value¶
The value of the objective function \(\Phi_X(C)\), where \(C\) is model.centroids.
- Getter & Setter
double get_objective_function_value() const
auto & set_objective_function_value(double value)
- Invariants
- objective_function_value >= 0.0
-
std::int64_t iteration_count¶
The number of iterations performed by the algorithm. Default value: 0.
- Getter & Setter
std::int64_t get_iteration_count() const
auto & set_iteration_count(std::int64_t value)
- Invariants
- iteration_count >= 0
Operation¶
-
template<typename Descriptor>
kmeans::train_result train(const Descriptor &desc, const kmeans::train_input &input)¶ - Parameters
desc – K-Means algorithm descriptor kmeans::descriptor
input – Input data for the training operation
- Preconditions
- Postconditions
- result.labels.row_count == input.data.row_countresult.labels.column_count == 1result.labels[i] >= 0result.labels[i] < desc.cluster_countresult.iteration_count <= desc.max_iteration_countresult.model.centroids.row_count == desc.cluster_countresult.model.centroids.column_count == input.data.column_count
Inference infer(...)¶
Input¶
-
template<typename Task = task::by_default>
class infer_input¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be task::clustering.
Constructors
-
infer_input(const model<Task> &trained_model, const table &data)¶
Creates a new instance of the class with the given
model
anddata
.
Properties
Result¶
-
template<typename Task = task::by_default>
class infer_result¶ - Template Parameters
Task – Tag-type that specifies type of the problem to solve. Can be task::clustering.
Constructors
-
infer_result()¶
Creates a new instance of the class with the default property values.
Properties
-
const table &labels¶
An \(n \times 1\) table with assignments labels to feature vectors in the input data. Default value: table{}.
- Getter & Setter
const table & get_labels() const
auto & set_labels(const table &value)
-
double objective_function_value¶
The value of the objective function \(\Phi_X(C)\), where \(C\) is defined by the corresponding infer_input::model::centroids. Default value: 0.0.
- Getter & Setter
double get_objective_function_value() const
auto & set_objective_function_value(double value)
- Invariants
- objective_function_value >= 0.0
-
const table &responses¶
An \(n \times 1\) table with assignments responses to feature vectors in the input data. Default value: table{}.
- Getter & Setter
const table & get_responses() const
auto & set_responses(const table &value)
-
const result_option_id &result_options¶
Result options that indicates availability of the properties. Default value: default_result_options<Task>.
- Getter & Setter
const result_option_id & get_result_options() const
auto & set_result_options(const result_option_id &value)
Operation¶
-
template<typename Descriptor>
kmeans::infer_result infer(const Descriptor &desc, const kmeans::infer_input &input)¶ - Parameters
desc – K-Means algorithm descriptor kmeans::descriptor
input – Input data for the inference operation
- Preconditions
- Postconditions
Usage Example¶
Training¶
kmeans::model<> run_training(const table& data,
const table& initial_centroids) {
const auto kmeans_desc = kmeans::descriptor<float>{}
.set_cluster_count(10)
.set_max_iteration_count(50)
.set_accuracy_threshold(1e-4);
const auto result = train(kmeans_desc, data, initial_centroids);
print_table("labels", result.get_labels());
print_table("centroids", result.get_model().get_centroids());
print_value("objective", result.get_objective_function_value());
return result.get_model();
}
Inference¶
table run_inference(const kmeans::model<>& model,
const table& new_data) {
const auto kmeans_desc = kmeans::descriptor<float>{}
.set_cluster_count(model.get_cluster_count());
const auto result = infer(kmeans_desc, model, new_data);
print_table("labels", result.get_labels());
}
Examples¶
Batch Processing:
Batch Processing: