AdaBoost Multiclass Classifier

AdaBoost (short for “Adaptive Boosting”) is a popular boosting classification algorithm. The AdaBoost algorithm performs well on a variety of data sets except some noisy data ([Friedman98], [Zhu2005]). The library supports two methods for the algorithms:

  • SAMME, or Stagewise Additive Modeling using a Multi-class Exponential loss function [Zhu2005]

  • SAMME.R, which is a modification of SAMME method for Real-valued returned probabilities from weak learner

Details

Given \(n\) feature vectors \(x_1 = (x_{11}, \ldots, x_{1p}), \ldots x_n = (x_{n1}, \ldots, x_{np})\) of size \(p\), a vector of class labels \(y = (y_1, \ldots, y_n)\) where \(y_i \in K = \{-1, 1\}\) in case of binary classification and \(y_i \in K = \{ 0, \ldots, C-1 \}\), where \(C\) is a number of classes, describes the class \(t\) the feature vector \(x_i\) belongs to, and \(h_t\) is a weak learner algorithm, the problem is to build an AdaBoost classifier.

Training Stage

The following scheme shows the major steps of the SAMME algorithm:

  1. Initialize weights \(D_1(i) = \frac{1}{n}\) for \(i = 1, \ldots, n\)

  2. For \(t = 1, \ldots, T\):

    • Train the weak learner \(h_t(i)\) using weights \(D_t\).

    • Choose a confidence value \(\alpha_t = \log \frac{1 - \mathrm{err}_t}{\mathrm{err}_t} + \log(C-1)\), where \(\mathrm{err}_t = \frac{\sum_{i=1}{n} D_t(i) I(y_i \neq h_t(i))}{\sum_{i=1}{n} D_t(i)}\)

    • Update \(D_{t+1}(i) = \frac{D_t{i} \mathrm{exp}(-\alpha_t I(y_i \neq h_t(i)))}{Z_t}\), where \(Z_t\) is a normalization factor.

  3. Output the final hypothesis:

    \[H(x) = \underset{k} {\mathrm{argmax}} \sum_{t=1}^{T} \alpha_t I(h_t{x} = k)\]

Note

SAMME algorithm in case of binary classification is equal to the AdaBoost algorithm from [Friedman98].

The following scheme shows the major steps of the SAMME.R algorithm:

  1. Initialize weights \(D_1(i) = \frac{1}{n}\) for \(i = 1, \ldots, n\)

  2. For \(t = 1, \ldots, T\):

    • Train the weak learner \(h_t(i)\) using weights \(D_t\).

    • Receive the weighed class probability estimates from weak learner:

      \[p_k^t(x) = \mathrm{Prob}_w \{ c = k | x\}, k = 0, \ldots, C-1\]
    • For \(k = 0, \ldots, C-1\), set \(s_k^t(x)\):

      \[s_k^t(x) = (C-1) \left( \log p_k^t(x) - \frac{1}{C} \sum_{k=0}^{C-1} \log p_k^t(x) \right)\]
    • For \(i = 1, \ldots, n\), update \(D_{t+1}(i)\):

      \[D_{t+1}(i) = \frac{1}{Z_t} \mathrm{exp} \left(- \frac{C-1}{C} z_i^T \log p^t (x) \right)\]

      where \(Z_t\) is a normalization factor, \(z_i = (z_{i1}, \ldots, z_{iC})\), \(z_{ik} = \begin{cases} 1, & k = y_i \\ - \frac{1}{K-1}, & k \neq y_i \end{cases}\)

  3. Output the final hypothesis:

    \[H(x) = \underset{k} {\mathrm{argmax}} \sum_{t=1}^{T} s_k^t(x)\]

Prediction Stage

Given the AdaBoost classifier and \(r\) feature vectors \(x_1, \ldots, x_r\), the problem is to calculate the final class \(H(x)\):

\[H(x) = \underset{k} {\mathrm{argmax}} \sum_{t=1}^{T} \alpha_t I(h_t{x} = k)\]

Given the AdaBoost classifier and \(r\) feature vectors \(x_1, \ldots, x_r\), the problem is to calculate the final class \(H(x)\):

\[H(x) = \underset{k} {\mathrm{argmax}} \sum_{t=1}^{T} s_k^t(x)\]

where \(s_k^t(x)\) is as defined above in Training Stage.

Batch Processing

AdaBoost classifier follows the general workflow described in Classification Usage Model.

Training

For a description of the input and output, refer to Classification Usage Model. At the training stage, an AdaBoost classifier has the following parameters:

Parameter

Default Value

Description

algorithmFPType

float

The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

method

defaultDense

Available methods for computation of the AdaBoost algorithm:

  • samme - uses the classifier that returns labels as weak learner

  • sammeR - uses the classifier that returns probabilities of belonging to class as weak learner

  • defaultDense is equal to samme method

weakLearnerTraining

Pointer to an object of the classification stump training class

Pointer to the training algorithm of the weak learner. By default, a classification stump weak learner is used.

weakLearnerPrediction

Pointer to an object of the classification stump prediction class

Pointer to the prediction algorithm of the weak learner. By default, a classification stump weak learner is used.

accuracyThreshold

\(0.01\)

AdaBoost training accuracy.

maxIterations

\(100\)

The maximal number of iterations for the algorithm.

learningRate

\(1.0\)

Multiplier for each classifier to shrink its contribution.

nClasses

\(2\)

The number of classes.

resultsToCompute

\(0\)

The 64-bit integer flag that specifies which extra characteristics of AdaBoost to compute. Current version of the library only provides the following option: computeWeakLearnersErrors

Output

In addition to classifier output, AdaBoostcalculates the result described below. Pass the Result ID as a parameter to the methods that access the result of your algorithm. For more details, see Algorithms.

Result ID

Result

weakLearnersErrors

A numeric table \(1 \times \mathrm{maxIterations}\) containing weak learner’s classification errors computed when the computeWeakLearnersErrors option is on.

Note

By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable.

Prediction

For a description of the input and output, refer to Classification Usage Model. At the prediction stage, an AdaBoost classifier has the following parameters:

Parameter

Default Value

Description

algorithmFPType

float

The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

method

defaultDense

Performance-oriented computation method, the only method supported by the AdaBoost classifier at the prediction stage.

weakLearnerPrediction

Pointer to an object of the classification stump prediction class

Pointer to the prediction algorithm of the weak learner. By default, a classification stump weak learner is used.

nClasses

\(2\)

The number of classes.