Quality Metrics for Binary Classification Algorithms¶
For two classes \(C_1\) and \(C2\), given a vector \(X = (x_1, \ldots, x_n)\) of class labels computed at the prediction stage of the classification algorithm and a vector \(Y = (y_1, \ldots, y_n)\) of expected class labels, the problem is to evaluate the classifier by computing the confusion matrix and connected quality metrics: precision, recall, and so on.
QualityMetricsId
for binary classification is confusionMatrix
.
Details¶
Further definitions use the following notations:
\(\text{tp}\) |
true positive |
the number of correctly recognized observations for class \(C_1\) |
\(\text{tn}\) |
true negative |
the number of correctly recognized observations that do not belong to the class \(C_1\) |
\(\text{fp}\) |
false positive |
the number of observations that were incorrectly assigned to the class \(C_1\) |
\(\text{fn}\) |
false negative |
the number of observations that were not recognized as belonging to the class \(C_1\) |
The library uses the following quality metrics for binary classifiers:
Quality Metric |
Definition |
---|---|
Accuracy |
\(\frac {\text{tp} + \text{tn}}{\text{tp} + \text{fn} + \text{fp} + \text{tn}}\) |
Precision |
\(\frac {\text{tp}}{\text{tp} + \text{fp}}\) |
Recall |
\(\frac {\text{tp}}{\text{tp} + \text{fn}}\) |
F-score |
\(\frac {(\beta^2 + 1) \text{tp}}{(\beta^2 + 1) \text{tp} + \beta^2 \text{fn} + \text{fp}}\) |
Specificity |
\(\frac {\text{tn}}{\text{fp} + \text{tn}}\) |
Area under curve (AUC) |
\(\frac {1}{2}(\frac {\text{tp}}{\text{tp} + \text{fn}} + \frac {\text{tn}}{\text{tn} + \text{fp}})\) |
For more details of these metrics, including the evaluation focus, refer to [Sokolova09].
The confusion matrix is defined as follows:
Classified as Class \(C_1\) |
Classified as Class \(C_2\) |
|
---|---|---|
Actual Class \(C_1\) |
tp |
fn |
Actual Class \(C_2\) |
fp |
tn |
Batch Processing¶
Algorithm Input¶
The quality metric algorithm for binary classifiers accepts the input described below.
Pass the Input ID
as a parameter to the methods that provide input for your algorithm.
For more details, see Algorithms.
Input ID |
Input |
---|---|
|
Pointer to the \(n \times 1\) numeric table that contains labels computed at the prediction stage of the classification algorithm. This input can be an object of any class derived from |
|
Pointer to the \(n \times 1\) numeric table that contains expected labels. This input can be an object of any class derived from |
Algorithm Parameters¶
The quality metric algorithm has the following parameters:
Parameter |
Default Value |
Description |
---|---|---|
|
|
The floating-point type that the algorithm uses for intermediate computations. Can be |
|
|
Performance-oriented computation method, the only method supported by the algorithm. |
|
\(1\) |
The \(\beta\) parameter of the F-score quality metric provided by the library. |
Algorithm Output¶
The quality metric algorithm calculates the result described below.
Pass the Result ID
as a parameter to the methods that access the results of your algorithm.
For more details, see Algorithms.
Result ID |
Result |
---|---|
|
Pointer to the \(2 \times 2\) numeric table with the confusion matrix. Note By default, this result is an object of the |
|
Pointer to the \(1 \times 6\) numeric table that contains quality metrics, which you can access by an appropriate Binary Metrics ID:
Note By default, this result is an object of the |
Examples¶
Batch Processing: