Computation Modes#

The library algorithms support the following computation modes:

You can select the computation mode during initialization of the Algorithm.

For a list of computation parameters of a specific algorithm in each computation mode, possible input types, and output results, refer to the description of an appropriate algorithm.

Batch processing#

All oneDAL algorithms support at least the batch processing computation mode. In the batch processing mode, the only compute method of a particular algorithm class is used.

Online processing#

Some oneDAL algorithms enable processing of data sets in blocks. In the online processing mode, the compute(), and finalizeCompute() methods of a particular algorithm class are used. This computation mode assumes that the data arrives in blocks \(i = 1, 2, 3, \ldots \text{nblocks}\). Call the compute() method each time a new input becomes available. When the last block of data arrives, call the finalizeCompute() method to produce final results. If the input data arrives in an asynchronous mode, you can use the getStatus() method for a given data source to check whether a new block of data is available for loading.

The following diagram illustrates the computation schema for online processing:

Note

While different data blocks may have different numbers of observations \(n_i\), they must have the same number of feature vectors \(p\).

Distributed processing#

Some oneDAL algorithms enable processing of data sets distributed across several devices. In distributed processing mode, the compute() and the finalizeCompute() methods of a particular algorithm class are used. This computation mode assumes that the data set is split in nblocks blocks across computation nodes.

Computation is done in several steps. You need to define the computation step for an algorithm by providing the computeStep value to the constructor during initialization of the algorithm. Use the compute() method on each computation node to compute partial results. Use the input.add() method on the master node to add pointers to partial results processed on each computation node. When the last partial result arrives, call the compute() method followed by finalizeCompute() to produce final results. If the input data arrives in an asynchronous mode, you can use the getStatus() method for a given data source to check whether a new block of data is available for loading.

The computation schema is algorithm-specific. The following diagram illustrates a typical computation schema for distribute processing:

For the algorithm-specific computation schema, refer to the Distributed Processing section in the description of an appropriate algorithm.

Distributed algorithms in oneDAL are abstracted from underlying cross-device communication technology, which enables use of the library in a variety of multi-device computing and data transfer scenarios. They include but are not limited to MPI* based cluster environments, Hadoop* or Spark* based cluster environments, low-level data exchange protocols, and more.