AdaBoost Multiclass Classifier¶
AdaBoost (short for “Adaptive Boosting”) is a popular boosting classification algorithm. The AdaBoost algorithm performs well on a variety of data sets except some noisy data ([Friedman98], [Zhu2005]). The library supports two methods for the algorithms:
SAMME, or Stagewise Additive Modeling using a Multi-class Exponential loss function [Zhu2005]
SAMME.R, which is a modification of SAMME method for Real-valued returned probabilities from weak learner
Details¶
Given \(n\) feature vectors \(x_1 = (x_{11}, \ldots, x_{1p}), \ldots x_n = (x_{n1}, \ldots, x_{np})\) of size \(p\), a vector of class labels \(y = (y_1, \ldots, y_n)\) where \(y_i \in K = \{-1, 1\}\) in case of binary classification and \(y_i \in K = \{ 0, \ldots, C-1 \}\), where \(C\) is a number of classes, describes the class \(t\) the feature vector \(x_i\) belongs to, and \(h_t\) is a weak learner algorithm, the problem is to build an AdaBoost classifier.
Training Stage¶
The following scheme shows the major steps of the SAMME algorithm:
Initialize weights \(D_1(i) = \frac{1}{n}\) for \(i = 1, \ldots, n\)
For \(t = 1, \ldots, T\):
Train the weak learner \(h_t(i)\) using weights \(D_t\).
Choose a confidence value \(\alpha_t = \log \frac{1 - \mathrm{err}_t}{\mathrm{err}_t} + \log(C-1)\), where \(\mathrm{err}_t = \frac{\sum_{i=1}{n} D_t(i) I(y_i \neq h_t(i))}{\sum_{i=1}{n} D_t(i)}\)
Update \(D_{t+1}(i) = \frac{D_t{i} \mathrm{exp}(-\alpha_t I(y_i \neq h_t(i)))}{Z_t}\), where \(Z_t\) is a normalization factor.
Output the final hypothesis:
\[H(x) = \underset{k} {\mathrm{argmax}} \sum_{t=1}^{T} \alpha_t I(h_t{x} = k)\]
Note
SAMME algorithm in case of binary classification is equal to the AdaBoost algorithm from [Friedman98].
The following scheme shows the major steps of the SAMME.R algorithm:
Initialize weights \(D_1(i) = \frac{1}{n}\) for \(i = 1, \ldots, n\)
For \(t = 1, \ldots, T\):
Train the weak learner \(h_t(i)\) using weights \(D_t\).
Receive the weighed class probability estimates from weak learner:
\[p_k^t(x) = \mathrm{Prob}_w \{ c = k | x\}, k = 0, \ldots, C-1\]For \(k = 0, \ldots, C-1\), set \(s_k^t(x)\):
\[s_k^t(x) = (C-1) \left( \log p_k^t(x) - \frac{1}{C} \sum_{k=0}^{C-1} \log p_k^t(x) \right)\]For \(i = 1, \ldots, n\), update \(D_{t+1}(i)\):
\[D_{t+1}(i) = \frac{1}{Z_t} \mathrm{exp} \left(- \frac{C-1}{C} z_i^T \log p^t (x) \right)\]where \(Z_t\) is a normalization factor, \(z_i = (z_{i1}, \ldots, z_{iC})\), \(z_{ik} = \begin{cases} 1, & k = y_i \\ - \frac{1}{K-1}, & k \neq y_i \end{cases}\)
Output the final hypothesis:
\[H(x) = \underset{k} {\mathrm{argmax}} \sum_{t=1}^{T} s_k^t(x)\]
Prediction Stage¶
Given the AdaBoost classifier and \(r\) feature vectors \(x_1, \ldots, x_r\), the problem is to calculate the final class \(H(x)\):
Given the AdaBoost classifier and \(r\) feature vectors \(x_1, \ldots, x_r\), the problem is to calculate the final class \(H(x)\):
where \(s_k^t(x)\) is as defined above in Training Stage.
Batch Processing¶
AdaBoost classifier follows the general workflow described in Classification Usage Model.
Training¶
For a description of the input and output, refer to Classification Usage Model. At the training stage, an AdaBoost classifier has the following parameters:
Parameter |
Default Value |
Description |
---|---|---|
|
|
The floating-point type that the algorithm uses for intermediate computations. Can be |
|
|
Available methods for computation of the AdaBoost algorithm:
|
|
Pointer to an object of the classification stump training class |
Pointer to the training algorithm of the weak learner. By default, a classification stump weak learner is used. |
|
Pointer to an object of the classification stump prediction class |
Pointer to the prediction algorithm of the weak learner. By default, a classification stump weak learner is used. |
|
\(0.01\) |
AdaBoost training accuracy. |
|
\(100\) |
The maximal number of iterations for the algorithm. |
|
\(1.0\) |
Multiplier for each classifier to shrink its contribution. |
|
\(2\) |
The number of classes. |
|
\(0\) |
The 64-bit integer flag that specifies which extra characteristics of AdaBoost to compute.
Current version of the library only provides the following option: |
Output¶
In addition to classifier output, AdaBoost calculates the result described below.
Pass the Result ID
as a parameter to the methods that access the result of your algorithm.
For more details, see Algorithms.
Result ID |
Result |
---|---|
|
A numeric table \(1 \times \mathrm{maxIterations}\) containing weak learner’s classification errors
computed when the Note By default, this result is an object of the |
Prediction¶
For a description of the input and output, refer to Classification Usage Model. At the prediction stage, an AdaBoost classifier has the following parameters:
Parameter |
Default Value |
Description |
---|---|---|
|
|
The floating-point type that the algorithm uses for intermediate computations. Can be |
|
|
Performance-oriented computation method, the only method supported by the AdaBoost classifier at the prediction stage. |
|
Pointer to an object of the classification stump prediction class |
Pointer to the prediction algorithm of the weak learner. By default, a classification stump weak learner is used. |
|
\(2\) |
The number of classes. |