Distributed Processing#

This mode assumes that the data set is split into nBlocks blocks across computation nodes.

To compute DBSCAN algorithm in the distributed processing mode, use the general schema described in Algorithms with the following steps:

Step 1 - on Local Nodes#

In this step, the DBSCAN algorithm has the following parameters:

Algorithm Parameters for DBSCAN (Distributed Processing)#

Parameter

Default Valude

Description

algorithmFPType

float

The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

method

defaultDense

Available methods for computation of DBSCAN algorithm:

  • defaultDense – uses brute-force for neighborhood computation

blockIndex

Not applicable

Unique identifier of block initially passed for computation on the local node.

nBlocks

Not applicable

The number of blocks initially passed for computation on all nodes.

In this step, the DBSCAN algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, Algorithms.

Algorithm Input for DBSCAN (Distributed Processing, Step 1)#

Input ID

Input

step1Data

Pointer to the \(n \times p\) numeric table with the observations to be clustered.

Note

The input can be an object of any class derived from NumericTable.

Algorithm Output#

In this step, the DBSCAN algorithms calculates the partial results described below. Pass the Partial Result ID as a parameter to the methods that access the partial result of your algorithm. For more details, Algorithms.

Partial Results for DBSCAN (Distributed Processing, Step 1)#

Partial Result ID

Result

partialOrder

Pointer to the \(n \times 2\) numeric table containing information about observations: identifier of initial block and index in initial block. This information will be required to reconstruct initial blocks after transferring observations among nodes.

Note

By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Step 2 - on Local Nodes#

In this step, the DBSCAN algorithm has the following parameters:

Algorithm Parameters for DBSCAN (Distributed Processing)#

Parameter

Default Valude

Description

algorithmFPType

float

The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

method

defaultDense

Available methods for computation of DBSCAN algorithm:

  • defaultDense – uses brute-force for neighborhood computation

blockIndex

Not applicable

Unique identifier of block initially passed for computation on the local node.

nBlocks

Not applicable

The number of blocks initially passed for computation on all nodes.

In this step, the DBSCAN algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, Algorithms.

Algorithm Input for DBSCAN (Distributed Processing, Step 2)#

Input ID

Input

partialData

Pointer to the collection of numeric tables with \(p\) columns and arbitrary number of rows, containing observations to be clustered.

Note

The input can be an object of any class derived from DataCollection. The numeric tables in the collection can be an object of any class derived from NumericTable.

Algorithm Output#

In this step, the DBSCAN algorithms calculates the partial results described below. Pass the Partial Result ID as a parameter to the methods that access the partial result of your algorithm. For more details, Algorithms.

Partial Results for DBSCAN (Distributed Processing, Step 2)#

Partial Result ID

Result

boundingBox

Pointer to the \(2 \times p\) numeric table containing bounding box of input observations: first row contains minimum value of each feature, second row contains maximum value of each feature.

Note

By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Step 3 - on Local Nodes#

In this step, the DBSCAN algorithm has the following parameters:

Algorithm Parameters for DBSCAN (Distributed Processing)#

Parameter

Default Valude

Description

algorithmFPType

float

The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

method

defaultDense

Available methods for computation of DBSCAN algorithm:

  • defaultDense – uses brute-force for neighborhood computation

leftBlocks

Not applicable

The number of blocks that will process observations with value of selected split feature smaller than selected split value.

rightBlocks

Not applicable

The number of blocks that will process observations with value of selected split feature greater than selected split value.

In this step, the DBSCAN algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, Algorithms.

Algorithm Input for DBSCAN (Distributed Processing, Step 3)#

Input ID

Input

partialData

Pointer to the collection of numeric tables with \(p\) columns and arbitrary number of rows, containing observations to be clustered.

Note

The input can be an object of any class derived from DataCollection. The numeric tables in the collection can be an object of any class derived from NumericTable.

step3PartialBoundingBoxes

Pointer to the collection of the \(2 \times p\) numeric tables containing bounding boxes computed on step 2 and collected from all nodes participating in current iteration of geometric repartitioning process.

Note

The numeric tables in collection can be an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Algorithm Output#

In this step, the DBSCAN algorithms calculates the partial results described below. Pass the Partial Result ID as a parameter to the methods that access the partial result of your algorithm. For more details, Algorithms.

Partial Results for DBSCAN (Distributed Processing, Step 3)#

Partial Result ID

Result

split

Pointer to the \(1 \times 2\) numeric table containing information about split for current iteration of geometric repartitioning.

Note

By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Step 4 - on Local Nodes#

In this step, the DBSCAN algorithm has the following parameters:

Algorithm Parameters for DBSCAN (Distributed Processing)#

Parameter

Default Valude

Description

algorithmFPType

float

The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

method

defaultDense

Available methods for computation of DBSCAN algorithm:

  • defaultDense – uses brute-force for neighborhood computation

leftBlocks

Not applicable

The number of blocks that will process observations with value of selected split feature smaller than selected split value.

rightBlocks

Not applicable

The number of blocks that will process observations with value of selected split feature greater than selected split value.

In this step, the DBSCAN algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, Algorithms.

Algorithm Input for DBSCAN (Distributed Processing, Step 4)#

Input ID

Input

partialData

Pointer to the collection of numeric tables with \(p\) columns and arbitrary number of rows, containing observations to be clustered.

Note

The input can be an object of any class derived from DataCollection. The numeric tables in the collection can be an object of any class derived from NumericTable.

step4PartialOrders

Pointer to the collection of numeric table with \(2\) columns and arbitrary number of rows containing information about observations: identifier of initial block and index in initial block.

Note

The input can be an object of any class derived from DataCollection. The numeric tables in the collection can be an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

step4PartialSplits

Pointer to the collection of the \(1 \times 2\) numeric table containing information about split computed on step 3 and collected from all nodes participating in current iteration of geometric repartitioning process.

Note

The input can be an object of any class derived from DataCollection. The numeric tables in the collection can be an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Algorithm Output#

In this step, the DBSCAN algorithms calculates the partial results described below. Pass the Partial Result ID as a parameter to the methods that access the partial result of your algorithm. For more details, Algorithms.

Partial Results for DBSCAN (Distributed Processing, Step 4)#

Partial Result ID

Result

partitionedData

Pointer to the collection of (leftBlocks + rightBlocks) numeric tables with \(p\) columns and arbitrary number of rows containing observations for processing on nodes participating in current iteration of geometric repartitioning.

  • First leftBlocks numeric tables in collection have the value of selected split feature smaller than selected split value.

  • Next rightBlocks numeric tables in collection have the value of selected split feature larger than selected split value.

Note

By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Step 5 - on Local Nodes#

In this step, the DBSCAN algorithm has the following parameters:

Algorithm Parameters for DBSCAN (Distributed Processing, Step 5)#

Parameter

Default Valude

Description

algorithmFPType

float

The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

method

defaultDense

Available methods for computation of DBSCAN algorithm:

  • defaultDense – uses brute-force for neighborhood computation

blockIndex

Not applicable

Unique identifier of block initially passed for computation on the local node.

nBlocks

Not applicable

The number of blocks initially passed for computation on all nodes.

epsilon

Not applicable

The maximum distance between observations lying in the same neighborhood.

In this step, the DBSCAN algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, Algorithms.

Algorithm Input for DBSCAN (Distributed Processing, Step 5)#

Input ID

Input

partialData

Pointer to the collection of numeric tables with \(p\) columns and arbitrary number of rows, containing observations to be clustered.

Note

The input can be an object of any class derived from DataCollection. The numeric tables in the collection can be an object of any class derived from NumericTable.

step5PartialBoundingBoxes

Pointer to the collection of \(2 \times p\) numeric table containing bounding boxes computed on step 2 and collected from all nodes. Numeric tables in collection should be ordered by the identifiers of initial block of nodes.

Note

The input can be an object of any class derived from DataCollection. The numeric tables in the collection can be an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Algorithm Output#

In this step, the DBSCAN algorithms calculates the partial results described below. Pass the Partial Result ID as a parameter to the methods that access the partial result of your algorithm. For more details, Algorithms.

Partial Results for DBSCAN (Distributed Processing, Step 5)#

Partial Result ID

Result

partitionedHaloData

Pointer to the collection of nBlocks numeric tables with \(p\) columns and arbitrary number of rows containing observations from current node that should be used as halo observations on each node.

Numeric tables in the collection are ordered by the identifiers of initial block of nodes.

partitionedHaloDataIndices

Pointer to the collection of nBlocks numeric tables with \(1\) column and arbitrary number of rows containing indices of observations from current node that should be used as halo observations on each node.

Numeric tables in the collection are ordered by the identifiers of initial block of nodes.

Note

By default, this result is an object of the DataCollection class. The numeric tables in the collection can be an object of any class derived from NumericTable` except for ``PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Step 6 - on Local Nodes#

In this step, the DBSCAN algorithm has the following parameters:

Algorithm Parameters for DBSCAN (Distributed Processing, Step 6)#

Parameter

Default Valude

Description

algorithmFPType

float

The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

method

defaultDense

Available methods for computation of DBSCAN algorithm:

  • defaultDense – uses brute-force for neighborhood computation

blockIndex

Not applicable

Unique identifier of block initially passed for computation on the local node.

nBlocks

Not applicable

The number of blocks initially passed for computation on all nodes.

epsilon

Not applicable

The maximum distance between observations lying in the same neighborhood.

minObservations

Not applicable

The number of observations in a neighborhood for an observation to be considered as a core.

memorySavingMode

false

If flag is set to false, all neighborhoods will be computed and stored prior to clustering. It will require up to \(O(|\text{sum of sizes of neighborhoods}|)\) of additional memory, which in worst case can be \(O(|\text{number of observations}|^2)\). However, in general, performance may be better.

In this step, the DBSCAN algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, Algorithms.

Algorithm Input for DBSCAN (Distributed Processing, Step 6)#

Input ID

Input

partialData

Pointer to the collection of numeric tables with \(p\) columns and arbitrary number of rows, containing observations to be clustered.

Note

The input can be an object of any class derived from DataCollection. The numeric tables in the collection can be an object of any class derived from NumericTable.

haloData

Pointer to the collection of numeric tables with \(p\) columns and arbitrary number of rows, containing halo observations for current node computed on step 5.

Note

The input can be an object of any class derived from DataCollection. The numeric tables in the collection can be an object of any class derived from NumericTable.

haloDataIndices

Pointer to the collection of numeric tables with \(1\) column and arbitrary number of rows, containing indices for halo observations for current node computed on step 5.

Size of this collection should be equal to the size of collection for haloData’s Input ID.

Note

The input can be an object of any class derived from DataCollection. The numeric tables in the collection can be an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

haloDataBlocks

Pointer to the collection of \(1 \times 1\) numeric tables containing identifiers of initial block for halo observations for current node computed on step 5.

Size of this collection should be equal to the size of collection for haloData’s Input ID.

Note

The input can be an object of any class derived from DataCollection. The numeric tables in the collection can be an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Algorithm Output#

In this step, the DBSCAN algorithms calculates the partial results described below. Pass the Partial Result ID as a parameter to the methods that access the partial result of your algorithm. For more details, Algorithms.

Partial Results for DBSCAN (Distributed Processing, Step 6)#

Partial Result ID

Result

step6ClusterStructure

Pointer to the numeric table with \(4\) columns and arbitrary number of rows containing information about current clustering state of observations processed on the local node.

Note

By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

step6FinishedFlag

Pointer to \(1 \times 1\) numeric table containing the flag indicating that the clustering process is finished for current node.

Note

By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

step6NClusters

Pointer to \(1 \times 1\) numeric table containing the current number of clusters found on the local node.

Note

By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

step6Queries

Pointer to the collection of nBlocks numeric tables with \(3\) columns and arbitrary number of rows containing clustering queries that should be processed on each node. Numeric tables in collection ordered by the identifiers of initial block of nodes.

Note

By default, this result is an object of the DataCollection class. The numeric tables in the collection can be an object of any class derived from NumericTable` except for ``PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Step 7 - on Master Node#

In this step, the DBSCAN algorithm has the following parameters:

Algorithm Parameters for DBSCAN (Distributed Processing, Step 5)#

Parameter

Default Valude

Description

algorithmFPType

float

The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

method

defaultDense

Available methods for computation of DBSCAN algorithm:

  • defaultDense – uses brute-force for neighborhood computation

In this step, the DBSCAN algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, Algorithms.

Algorithm Input for DBSCAN (Distributed Processing, Step 7)#

Input ID

Input

partialFinishedFlags

Pointer to the collection of \(1 \times 1\) numeric table containing the flag indicating that the clustering process is finished collected from all nodes.

Note

The input can be an object of any class derived from DataCollection. The numeric tables in the collection can be an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Algorithm Output#

In this step, the DBSCAN algorithms calculates the results and partial results described below. Pass the Result ID as a parameter to the methods that access the result and partial result of your algorithm. For more details, Algorithms.

Partial Results for DBSCAN (Distributed Processing, Step 7)#

Partial Result ID

Result

finishedFlag

Pointer to \(1 \times 1\) numeric table containing the flag indicating that the clustering process is finished on all nodes.

Note

By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Step 8 - on Local Nodes#

In this step, the DBSCAN algorithm has the following parameters:

Algorithm Parameters for DBSCAN (Distributed Processing)#

Parameter

Default Valude

Description

algorithmFPType

float

The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

method

defaultDense

Available methods for computation of DBSCAN algorithm:

  • defaultDense – uses brute-force for neighborhood computation

blockIndex

Not applicable

Unique identifier of block initially passed for computation on the local node.

nBlocks

Not applicable

The number of blocks initially passed for computation on all nodes.

In this step, the DBSCAN algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, Algorithms.

Algorithm Input for DBSCAN (Distributed Processing, Step 8)#

Input ID

Input

step8InputClusterStructure

Pointer to the numeric table with \(4\) columns and arbitrary number of rows containing information about current clustering state of observations processed on the local node.

Note

The input can be an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

step8InputNClusters

Pointer to \(1 \times 1\) numeric tables containing the current number of clusters found on the local node.

Note

The input can be an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

step8PartialQueries

Pointer to the collection of numeric tables with \(3\) columns and arbitrary number of rows containing clustering queries that should be processed on the local node collected from all nodes.

Note

The input can be an object of any class derived from DataCollection. The numeric tables in the collection can be an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Algorithm Output#

In this step, the DBSCAN algorithms calculates the partial results described below. Pass the Partial Result ID as a parameter to the methods that access the partial result of your algorithm. For more details, Algorithms.

Partial Results for DBSCAN (Distributed Processing, Step 8)#

Partial Result ID

Result

step8ClusterStructure

Pointer to the numeric table with \(4\) columns and arbitrary number of rows containing information about current clustering state of observations processed on the local node.

Note

By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

step8FinishedFlag

Pointer to \(1 \times 1\) numeric table containing the flag indicating that the clustering process is finished for current node.

Note

By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

step8NClusters

Pointer to \(1 \times 1\) numeric table containing the current number of clusters found on the local node.

Note

By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

step8Queries

Pointer to the collection of nBlocks numeric tables with \(3\) columns and arbitrary number of rows containing clustering queries that should be processed on each node. Numeric tables in collection ordered by the identifiers of initial block of nodes.

Note

By default, this result is an object of the DataCollection class. The numeric tables in the collection can be an object of any class derived from NumericTable` except for ``PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Step 9 - on Master Node#

In this step, the DBSCAN algorithm has the following parameters:

Algorithm Parameters for DBSCAN (Distributed Processing, Step 5)#

Parameter

Default Valude

Description

algorithmFPType

float

The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

method

defaultDense

Available methods for computation of DBSCAN algorithm:

  • defaultDense – uses brute-force for neighborhood computation

In this step, the DBSCAN algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, Algorithms.

Algorithm Input for DBSCAN (Distributed Processing, Step 9)#

Input ID

Input

partialNClusters

Pointer to the collection of \(1 \times 1\) numeric table containing the number of clusters found on each node.

Note

The input can be an object of any class derived from DataCollection. The numeric tables in the collection can be an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Algorithm Output#

In this step, the DBSCAN algorithms calculates the results and partial results described below. Pass the Result ID as a parameter to the methods that access the result and partial result of your algorithm. For more details, Algorithms.

Algorithm Output for DBSCAN (Distributed Processing, Step 9)#

Result ID

Result

step9NClusters

Pointer to \(1 \times 1\) numeric table containing the number of clusters found on all nodes.

Note

By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Partial Results for DBSCAN (Distributed Processing, Step 9)#

Partial Result ID

Result

clusterOffsets

Pointer to the collection of \(1 \times 1\) numeric tables containing offsets for cluster numeration for each node. Numeric tables with offsets are given in the same order as in the collection for partialNClusters Input ID.

Note

By default, this result is an object of the DataCollection class. The numeric tables in the collection can be an object of any class derived from NumericTable` except for ``PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Step 10 - on Local Nodes#

In this step, the DBSCAN algorithm has the following parameters:

Algorithm Parameters for DBSCAN (Distributed Processing)#

Parameter

Default Valude

Description

algorithmFPType

float

The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

method

defaultDense

Available methods for computation of DBSCAN algorithm:

  • defaultDense – uses brute-force for neighborhood computation

blockIndex

Not applicable

Unique identifier of block initially passed for computation on the local node.

nBlocks

Not applicable

The number of blocks initially passed for computation on all nodes.

In this step, the DBSCAN algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, Algorithms.

Algorithm Input for DBSCAN (Distributed Processing, Step 10)#

Input ID

Input

step10InputClusterStructure

Pointer to the numeric table with \(4\) columns and arbitrary number of rows containing information about current clustering state of observations processed on the local node.

Note

The input can be an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

step10ClusterOffset

Pointer to \(1 \times 1\) numeric table containing the offset for cluster numeration on the local node computed on step 9.

Note

The input can be an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Algorithm Output#

In this step, the DBSCAN algorithms calculates the partial results described below. Pass the Partial Result ID as a parameter to the methods that access the partial result of your algorithm. For more details, Algorithms.

Partial Results for DBSCAN (Distributed Processing, Step 10)#

Partial Result ID

Result

step10ClusterStructure

Pointer to the numeric table with \(4\) columns and arbitrary number of rows containing information about current clustering state of observations processed on the local node.

Note

By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

step10FinishedFlag

Pointer to \(1 \times 1\) numeric table containing the flag indicating that the clusters numeration process is finished for current node.

Note

By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

step10Queries

Pointer to the collection of nBlocks numeric tables with \(4\) columns and arbitrary number of rows containing clusters numeration queries that should be processed on each node. Numeric tables in collection ordered by the identifiers of initial block of nodes.

Note

By default, this result is an object of the DataCollection class. The numeric tables in the collection can be an object of any class derived from NumericTable` except for ``PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Step 11 - on Local Nodes#

In this step, the DBSCAN algorithm has the following parameters:

Algorithm Parameters for DBSCAN (Distributed Processing)#

Parameter

Default Valude

Description

algorithmFPType

float

The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

method

defaultDense

Available methods for computation of DBSCAN algorithm:

  • defaultDense – uses brute-force for neighborhood computation

blockIndex

Not applicable

Unique identifier of block initially passed for computation on the local node.

nBlocks

Not applicable

The number of blocks initially passed for computation on all nodes.

In this step, the DBSCAN algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, Algorithms.

Algorithm Input for DBSCAN (Distributed Processing, Step 11)#

Input ID

Input

step11InputClusterStructure

Pointer to the numeric table with \(4\) columns and arbitrary number of rows containing information about current clustering state of observations processed on the local node.

Note

The input can be an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

step11PartialQueries

Pointer to the collection of numeric tables with \(4\) columns and arbitrary number of rows containing clusters numeration queries that should be processed on the local node collected from all nodes.

Note

The input can be an object of any class derived from DataCollection. The numeric tables in the collection can be an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Algorithm Output#

In this step, the DBSCAN algorithms calculates the partial results described below. Pass the Partial Result ID as a parameter to the methods that access the partial result of your algorithm. For more details, Algorithms.

Partial Results for DBSCAN (Distributed Processing, Step 11)#

Partial Result ID

Result

step11ClusterStructure

Pointer to the numeric table with \(4\) columns and arbitrary number of rows containing information about current clustering state of observations processed on the local node.

Note

By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

step11FinishedFlag

Pointer to \(1 \times 1\) numeric table containing the flag indicating that the clusters numeration process is finished for current node.

Note

By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

step11Queries

Pointer to the collection of nBlocks numeric tables with \(4\) columns and arbitrary number of rows containing clusters numeration queries that should be processed on each node.

Numeric tables in the collection are ordered by the identifiers of initial block of nodes.

Note

By default, this result is an object of the DataCollection class. The numeric tables in the collection can be an object of any class derived from NumericTable` except for ``PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Step 12 - on Local Nodes#

In this step, the DBSCAN algorithm has the following parameters:

Algorithm Parameters for DBSCAN (Distributed Processing)#

Parameter

Default Valude

Description

algorithmFPType

float

The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

method

defaultDense

Available methods for computation of DBSCAN algorithm:

  • defaultDense – uses brute-force for neighborhood computation

blockIndex

Not applicable

Unique identifier of block initially passed for computation on the local node.

nBlocks

Not applicable

The number of blocks initially passed for computation on all nodes.

In this step, the DBSCAN algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, Algorithms.

Algorithm Input for DBSCAN (Distributed Processing, Step 12)#

Input ID

Input

step12InputClusterStructure

Pointer to the numeric table with \(4\) columns and arbitrary number of rows containing information about current clustering state of observations processed on the local node.

Note

The input can be an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

step12PartialOrders

Pointer to the collection of \(n \times 2\) numeric tables containing information about observations: identifier of initial block and index in initial block. This information will be required to reconstruct initial blocks after transferring observations among nodes.

Note

The input can be an object of any class derived from DataCollection. The numeric tables in the collection can be an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Algorithm Output#

In this step, the DBSCAN algorithms calculates the partial results described below. Pass the Partial Result ID as a parameter to the methods that access the partial result of your algorithm. For more details, Algorithms.

Partial Results for DBSCAN (Distributed Processing, Step 12)#

Partial Result ID

Result

assignmentQueries

Pointer to the collection of nBlocks numeric tables with \(2\) columns and arbitrary number of rows containing clusters assigning queries that should be processed on each node.

Numeric tables in the collection are ordered by the identifiers of initial block of nodes.

Note

By default, this result is an object of the DataCollection class. The numeric tables in the collection can be an object of any class derived from NumericTable` except for ``PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Step 13 - on Local Nodes#

In this step, the DBSCAN algorithm has the following parameters:

Algorithm Parameters for DBSCAN (Distributed Processing, Step 5)#

Parameter

Default Valude

Description

algorithmFPType

float

The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

method

defaultDense

Available methods for computation of DBSCAN algorithm:

  • defaultDense – uses brute-force for neighborhood computation

In this step, the DBSCAN algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, Algorithms.

Algorithm Input for DBSCAN (Distributed Processing, Step 13)#

Input ID

Input

partialAssignmentQueries

Pointer to the collection of numeric tables with \(2\) columns and arbitrary number of rows containing clusters assigning queries that should be processed on the local node collected from all nodes.

Note

The input can be an object of any class derived from DataCollection. The numeric tables in the collection can be an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Algorithm Output#

In this step, the DBSCAN algorithms calculates the results and partial results described below. Pass the Result ID as a parameter to the methods that access the result and partial result of your algorithm. For more details, Algorithms.

Algorithm Output for DBSCAN (Distributed Processing, Step 13)#

Result ID

Result

step13Assignments

Pointer to the \(n \times 1\) numeric table with assignments of cluster indices to observations processed on step 1 on the local node. Noise observations have the assignment equal to \(-1\).

Note

By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Partial Results for DBSCAN (Distributed Processing, Step 13)#

Partial Result ID

Result

step13AssignmentsQueries

Pointer to the numeric table with \(2\) columns and arbitrary number of rows containing clusters assigning queries that should be processed on the local node.

Note

By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.