Distributed Processing

You can use the Naïve Bayes classifier algorithm in the distributed processing mode only at the training stage.

This computation mode assumes that the data set is split in nblocks blocks across computation nodes.

Training

Algorithm Parameters

At the training stage, Naïve Bayes classifier in the distributed processing mode has the following parameters:

Training Parameters for Naïve Bayes Classifier (Distributed Processing)

Parameter

Default Valude

Description

computeStep

Not applicable

The parameter required to initialize the algorithm. Can be:

  • step1Local - the first step, performed on local nodes

  • step2Master - the second step, performed on a master node

algorithmFPType

float

The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

method

defaultDense

Available computation methods for the Naïve Bayes classifier:

  • defaultDense - default performance-oriented method

  • fastCSR - performance-oriented method for CSR numeric tables

nClasses

Not applicable

The number of classes. A required parameter.

priorClassEstimates

\(1/\text{nClasses}\)

Vector of size nClasses that contains prior class estimates. The default value applies to each vector element.

alpha

\(1\)

Vector of size \(p\) that contains the imagined occurrences of features. The default value applies to each vector element.

Use the two-step computation schema for Naïve Bayes classifier training in the distributed processing mode, as illustrated below:

Step 1 - on Local Nodes

Training with Naïve Bayes Classifier: Distributed Processing, Step 1 - on Local Nodes

In this step, Naïve Bayes classifier training accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.

Training Input for Naïve Bayes Classifier (Distributed Processing, Step 1)

Input ID

Input

data

Pointer to the \(n_i \times p\) numeric table that represents the current data block.

labels

Pointer to the \(n_i \times 1\) numeric table with class labels associated with the current data block.

Note

These tables can be objects of any class derived from NumericTable.

In this step, Naïve Bayes classifier training calculates the result described below. Pass the Result ID as a parameter to the methods that access the results of your algorithm. For more details, see Algorithms.

Training Output for Naïve Bayes Classifier (Distributed Processing, Step 1)

Result ID

Result

partialModel

Pointer to the partial Naïve Bayes classifier model that corresponds to the \(i\)-th data block.

The result can only be an object of the Model class.

Step 2 - on Master Node

Trainin with Naïve Bayes Classifier: Distributed Processing, Step 2 - on Master Node

In this step, Naïve Bayes classifier training accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.

Training Input for Naïve Bayes Classifier (Distributed Processing, Step 2)

Input ID

Input

partialModels

A collection of partial models computed on local nodes in Step 1.

The collection contains objects of the Model class.

In this step, Naïve Bayes classifier training calculates the result described below. Pass the Result ID as a parameter to the methods that access the results of your algorithm. For more details, see Algorithms.

Training Output for Naïve Bayes Classifier (Distributed Processing, Step 2)

Result ID

Result

model

Pointer to the Naïve Bayes classifier model being trained.

The result can only be an object of the Model class.