Classification Decision Tree#
Classification decision tree is a kind of a decision tree described in Decision Tree.
Details#
Given:
n feature vectors \(x_1 = (x_{11}, \ldots, x_{1p}), \ldots, x_n = (x_{n1}, \ldots, x_{np})\) of size \(p\)
The vector of class labels \(y = (y_1, \ldots, y_n)\) that describes the class to which the feature vector \(x_i\) belongs, where \(y_i \in \{0, 1, \ldots, C-1\}\) and C is the number of classes.
The problem is to build a decision tree classifier.
Split Criteria#
The library provides the decision tree classification algorithm based on split criteria Gini index [Breiman84] and Information gain [Quinlan86], [Mitchell97]:
Gini index
\[{I}_{Gini}\left(D\right)=1-\sum _{i=0}^{C-1}{p}_{i}^{2}\]where
\(D\) is a set of observations that reach the node
\(p_i\) is the observed fraction of observations with class \(i\) in \(D\)
To find the best test using Gini index, each possible test is examined using
\[\text{Δ}{I}_{Gini}\left(D,\text{ τ}\right)={I}_{Gini}\left(D\right)-\sum _{v\in O\left(\text{τ}\right)}\frac{|{D}_{v}|}{|D|}{I}_{Gini}\left({D}_{v}\right)\phantom{\rule{0ex}{0ex}}\phantom{\rule{0ex}{0ex}}\]where
\(O(\tau)\) is the set of all possible outcomes of test \(\tau\)
\(D_v\) is the subset of \(D\), for which outcome of \(\tau\) is \(v\), for example \({D}_{v}=\left\{d\in D\text{|τ}\left(d\right)=v\right\}\)
The test to be used in the node is selected as \(\underset{\tau }{\text{argmax}}\text{Δ}{I}_{Gini}\left(D,\tau \right)\). For binary decision tree with ‘true’ and ‘false’ branches, \(\text{Δ}{I}_{Gini}\left(D, \text{τ}\right)={I}_{Gini}\left(D\right)-\frac{|{D}_{true}|}{|D|}{I}_{Gini}\left({D}_{true}\right)-\frac{|{D}_{false}|}{|D|}{I}_{Gini}\left({D}_{false}\right)\)
Information gain
\[\text{Δ}{I}_{Gini}\left(D, \text{τ}\right)={I}_{Gini}\left(D\right)-\frac{|{D}_{true}|}{|D|}{I}_{Gini}\left({D}_{true}\right)-\frac{|{D}_{false}|}{|D|}{I}_{Gini}\left({D}_{false}\right)\]where
\(O(\tau)\), \(D\), \(D_v\) are defined above
\({I}_{Entropy}\left(D\right)=-\sum _{i=0}^{C-1}{p}_{i}\mathrm{log}{p}_{i}\), with \(p_i\) defined above in Gini index.
Similarly to Gini index, the test to be used in the node is selected as \(\underset{\tau }{\text{argmax}}InfoGain\left(D,\tau \right)\). For binary decision tree with ‘true’ and ‘false’ branches, \(\text{Δ}{I}_{Gini}\left(D, \text{τ}\right)={I}_{Gini}\left(D\right)-\frac{|{D}_{true}|}{|D|}{I}_{Gini}\left({D}_{true}\right)-\frac{|{D}_{false}|}{|D|}{I}_{Gini}\left({D}_{false}\right)\)
Training Stage#
The classification decision tree follows the algorithmic framework of decision tree training described in Decision Tree.
Prediction Stage#
The classification decision tree follows the algorithmic framework of decision tree prediction described in Decision Tree.
Given decision tree and vectors \(x_i, \ldots, x_r\), the problem is to calculate the responses for those vectors.
Batch Processing#
Decision tree classification follows the general workflow described in Classification Usage Model.
Training#
In addition to common input for a classifier, decision trees can accept the following inputs that are used for post-pruning:
Input ID |
Input |
---|---|
|
Pointer to the \(m \times p\) numeric table with the pruning data set. This table can be an object of any class derived from NumericTable. |
|
Pointer to the \(m \times 1\) numeric table with class labels. This table can be an object of any class derived from NumericTable except PackedSymmetricMatrix and PackedTriangularMatrix. |
At the training stage, decision tree classifier has the following parameters:
Parameter |
Default Value |
Description |
---|---|---|
|
|
The floating-point type that the algorithm uses for intermediate computations. Can be |
|
|
The computation method used by the decision tree classification. The only training method supported so far is the default dense method. |
|
Not applicable |
The number of classes. A required parameter. |
|
|
Split criterion to choose the best test for split nodes. Available split criteria for decision trees:
|
|
|
Method to perform post-pruning. Available options for the pruning parameter:
|
|
\(0\) |
Maximum tree depth. Zero value means unlimited depth. Can be any non-negative number. |
|
\(1\) |
Minimum number of observations in the leaf node. Can be any positive number. |
Prediction#
At the prediction stage, decision tree classifier has the following parameters:
Parameter |
Default Value |
Description |
---|---|---|
|
|
The floating-point type that the algorithm uses for intermediate computations. Can be |
|
|
The computation method used by the decision tree classification. The only training method supported so far is the default dense method. |
Examples#
Batch Processing:
Batch Processing: