K-Means initialization

The K-Means initialization algorithm receives \(n\) feature vectors as input and chooses \(k\) initial centroids. After initialization, K-Means algorithm uses the initialization result to partition input data into \(k\) clusters.

Operation

Computational methods

Programming Interface

Computing

Dense

Random dense

K-Means++

K-Means++ parallel

compute(…)

compute_input(…)

compute_result(…)

Mathematical formulation

Refer to Developer Guide: K-Means Initialization.

Programming Interface

All types and functions in this section are declared in the oneapi::dal::kmeans_init namespace and be available via inclusion of the oneapi/dal/algo/kmeans_init.hpp header file.

Descriptor

template<typename Float = float, typename Method = method::by_default, typename Task = task::by_default>
class descriptor
Template Parameters
  • Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

  • Method – Tag-type that specifies an implementation of K-Means Initialization algorithm.

  • Task – Tag-type that specifies the type of the problem to solve. Can be task::init.

Constructors

descriptor(std::int64_t cluster_count = 2)

Creates a new instance of the class with the given cluster_count.

Properties

auto &local_trials_count

Number of attempts to find the best sample in terms of potential value If the value is equal to -1, the number of trials is 2 + int(log(cluster_count)). Default value: -1.

Getter & Setter
template <typename M = Method, typename None = detail::v1::enable_if_plus_plus_dense<M>> auto & get_local_trials_count() const
template <typename M = Method, typename None = detail::v1::enable_if_plus_plus_dense<M>> auto & set_local_trials_count(std::int64_t value=-1)
Invariants
local_trials > 0 or :expr`local_trials = -1`
std::int64_t cluster_count

The number of clusters k. Default value: 2.

Getter & Setter
std::int64_t get_cluster_count() const
auto & set_cluster_count(std::int64_t value)
Invariants
auto &seed
Getter & Setter
template <typename M = Method, typename None = detail::v1::enable_if_not_default_dense<M>> auto & get_seed() const
template <typename M = Method, typename None = detail::v1::enable_if_not_default_dense<M>> auto & set_seed(std::int64_t value)

Method tags

struct dense

Tag-type that denotes dense computational method.

struct parallel_plus_dense

Tag-type that denotes parallel_plus_dense computational method.

struct plus_plus_dense

Tag-type that denotes plus_plus_dense computational method.

struct random_dense

Tag-type that denotes random_dense computational method.

using by_default = dense

Task tags

struct init

Tag-type that parameterizes entities used for obtaining the initial K-Means centroids.

using by_default = init

Alias tag-type for the initialization task.

Computing compute(...)

Input

template<typename Task = task::by_default>
class compute_input
Template Parameters

Task – Tag-type that specifies type of the problem to solve. Can be task::init.

Constructors

compute_input(const table &data)

Creates a new instance of the class with the given data.

Properties

const table &data

An \(n \times p\) table with the data to be clustered, where each row stores one feature vector. Default value: table{}.

Getter & Setter
const table & get_data() const
auto & set_data(const table &data)

Result

template<typename Task = task::by_default>
class compute_result
Template Parameters

Task – Tag-type that specifies type of the problem to solve. Can be oneapi::dal::kmeans::task::clustering.

Constructors

compute_result()

Creates a new instance of the class with the default property values.

Properties

const table &centroids

A \(k \times p\) table with the initial centroids. Each row of the table stores one centroid. Default value: table{}.

Getter & Setter
const table & get_centroids() const
auto & set_centroids(const table &value)

Operation

template<typename Descriptor>
kmeans_init::compute_result compute(const Descriptor &desc, const kmeans_init::compute_input &input)
Parameters
  • desc – K-Means algorithm descriptor kmeans_init::descriptor

  • input – Input data for the computing operation

Preconditions
input.data.has_data == true
input.data.row_count == desc.cluster_count
Postconditions
result.centroids.has_data == true
result.centroids.row_count == desc.cluster_count
result.centroids.column_count == input.data.column_count

Usage Example

Computing

table run_compute(const table& data) {
   const auto kmeans_desc = kmeans_init::descriptor<float,
                                                   kmeans_init::method::dense>{}
      .set_cluster_count(10)

   const auto result = compute(kmeans_desc, data);

   print_table("centroids", result.get_centroids());

   return result.get_centroids();
}

Examples

Batch Processing: