Logistic Loss¶
LogisticLoss is a common objective function used for binary classification.
Operation |
Computational methods |
|
Mathematical formulation¶
Computing¶
Algorithm takes dataset \(X = \{ x_1, \ldots, x_n \}\) with \(n\) feature vectors of dimension \(p\), vector with correct class labels \(y = \{ y_1, \ldots, y_n \}\) and coefficients vector \(w = \{ w_0, \ldots, w_p \}\) of size \(p + 1\) as input. Then it calculates logistic loss, its gradient or gradient using the following formulas.
Value¶
\(L(X, w, y) = \sum_{i = 1}^{n} -y_i \log(prob_i) - (1 - y_i) \log(prob_i)\), where \(prob_i = \sigma(w_0 + \sum_{j=1}^{p} w_j x_{i, j})\) - predicted probabilities, \(\sigma(x) = \frac{1}{1 + \exp(-x)}\) - sigmoid function. Note that probabilities are binded to interval \([\epsilon, 1 - \epsilon]\) to avoid problems with computing log function (\(\epsilon=10^{-7}\) if float type is used and \(10^{-15}\) otherwise)
Gradient¶
\(\overline{grad} = \frac{\partial L}{\partial w}\), where \(\overline{grad}_0 = \sum_{i=1}^{n} prob_i - y_i\), \(\overline{grad}_j = \sum_{i=1}^n X_{i, j} (prob_i - y_i) + L1 \cdot |w_j| + 2 \cdot L2 w_j\) for \(1 \leq j \leq p\)
Hessian¶
\(H = (h_{ij}) = \frac{\partial L}{\partial w \partial w}\), where \(h_{0,0}= \sum_{k=1}^n prob_k (1 - prob_k)\), \(h_{i,0} = h_{0,i} = \sum_{k=1}^n X_{k,i} \cdot prob_k (1 - prob_k) \), \(h_{i,j} = \sum_{k=1}^n X_{k,i} X_{k,j} \cdot prob_k (1 - prob_k) + [i = j] 2 \cdot L2\) for \(1 \leq i, j \leq p\)
Computation method: dense_batch¶
The method computes value of objective function, its gradient or hessian for the dense data. This is the default and the only method supported.
Programming Interface¶
Refer to API Reference: LogisticLoss.
Distributed mode¶
Currently algorithm does not support distributed execution in SMPD mode.