Batch Processing¶
Input¶
Centroid initialization for K-Means clustering accepts the input
described below. Pass the Input ID
as a parameter to the methods
that provide input for your algorithm.
Input ID |
Input |
---|---|
|
Pointer to the \(n \times p\) numeric table with the data to be clustered. |
Note
The input can be an object of any class derived from NumericTable
.
Parameters¶
The following table lists parameters of centroid initialization for K-Means clustering, which depend on the initialization method parameter method.
Parameter |
method |
Default Value |
Description |
---|---|---|---|
|
any |
|
The floating-point type that the algorithm uses for intermediate computations. Can be |
|
Not applicable |
|
Available initialization methods for K-Means clustering: For CPU:
For GPU:
|
|
any |
Not applicable |
The number of clusters. Required. |
|
|
\(1\) |
The number of trails to generate all clusters but the first initial cluster. For details, see [Arthur2007], section 5 |
|
|
\(0.5\) |
A fraction of nClusters in each of nRounds of parallel K-Means++. L=nClusters*oversamplingFactor points are sampled in a round. For details, see [Bahmani2012], section 3.3. |
|
|
\(5\) |
The number of rounds for parallel K-Means++. (L*nRounds) must be greater than nClusters. For details, see [Bahmani2012], section 3.3. |
|
any |
SharePtr< engines:: mt19937:: Batch>() |
Pointer to the random number generator engine that is used internally for random numbers generation. |
Output¶
Centroid initialization for K-Means clustering calculates the
result described below. Pass the Result ID
as a parameter to the
methods that access the results of your algorithm.
Result ID |
Result |
---|---|
|
Pointer to the \(nClusters \times p\) numeric table with the cluster centroids. |
Note
By default, this result is an object of the HomogenNumericTable
class,
but you can define the result as an object of any class derived from NumericTable
except for PackedTriangularMatrix
, PackedSymmetricMatrix
, and CSRNumericTable
.