# K-Means initialization#

The K-Means initialization algorithm receives $$n$$ feature vectors as input and chooses $$k$$ initial centroids. After initialization, K-Means algorithm uses the initialization result to partition input data into $$k$$ clusters.

 Operation Computational methods Programming Interface Computing Dense Random dense K-Means++ K-Means++ parallel compute(…) compute_input(…) compute_result(…)

## Mathematical formulation#

### Computing#

Given the training set $$X = \{ x_1, \ldots, x_n \}$$ of $$p$$-dimensional feature vectors and a positive integer $$k$$, the problem is to find a set $$C = \{ c_1, \ldots, c_k \}$$ of $$p$$-dimensional initial centroids.

### Computing method: dense#

The method chooses first $$k$$ feature vectors from the training set $$X$$.

### Computing method: random_dense#

The method chooses random $$k$$ feature vectors from the training set $$X$$.

### Computing method: plus_plus_dense (only on CPU)#

The method is designed as follows: the first centroid $$c_0$$ is selected randomly and $$C = \{ c_1 \}$$. Then the following step is repeated until $$C$$ reaches the necessary size.

$c_{i+1} = \mathrm{arg}\max_{x \in X} d^2(x, C), C = C \cup \{ c_{i+1} \}$

### Computing method: parallel_plus_dense (only on CPU)#

The method is the same as K-Means++, but the data is divided into equal parts and the algorithm runs on each of them.

## Usage Example#

### Computing#

table run_compute(const table& data) {
const auto kmeans_desc = kmeans_init::descriptor<float,
kmeans_init::method::dense>{}
.set_cluster_count(10)

const auto result = compute(kmeans_desc, data);

print_table("centroids", result.get_centroids());

return result.get_centroids();
}


## Examples#

Batch Processing: