Logistic Loss

LogisticLoss is a common objective function used for binary classification.

Operation

Computational methods

dense_batch

dense_batch

Mathematical formulation

Computing

Algorithm takes dataset \(X = \{ x_1, \ldots, x_n \}\) with \(n\) feature vectors of dimension \(p\), vector with correct class labels \(y = \{ y_1, \ldots, y_n \}\) and coefficients vector \(w = \{ w_0, \ldots, w_p \}\) of size \(p + 1\) as input. Then it calculates logistic loss, its gradient or gradient using the following formulas.

Value

\(L(X, w, y) = \sum_{i = 1}^{n} -y_i \log(prob_i) - (1 - y_i) \log(prob_i)\), where \(prob_i = \sigma(w_0 + \sum_{j=1}^{p} w_j x_{i, j})\) - predicted probabilities, \(\sigma(x) = \frac{1}{1 + \exp(-x)}\) - sigmoid function. Note that probabilities are binded to interval \([\epsilon, 1 - \epsilon]\) to avoid problems with computing log function (\(\epsilon=10^{-7}\) if float type is used and \(10^{-15}\) otherwise)

Gradient

\(\overline{grad} = \frac{\partial L}{\partial w}\), where \(\overline{grad}_0 = \sum_{i=1}^{n} prob_i - y_i\), \(\overline{grad}_j = \sum_{i=1}^n X_{i, j} (prob_i - y_i) + L1 \cdot |w_j| + 2 \cdot L2 w_j\) for \(1 \leq j \leq p\)

Hessian

\(H = (h_{ij}) = \frac{\partial L}{\partial w \partial w}\), where \(h_{0,0}= \sum_{k=1}^n prob_k (1 - prob_k)\), \(h_{i,0} = h_{0,i} = \sum_{k=1}^n X_{k,i} \cdot prob_k (1 - prob_k) \), \(h_{i,j} = \sum_{k=1}^n X_{k,i} X_{k,j} \cdot prob_k (1 - prob_k) + [i = j] 2 \cdot L2\) for \(1 \leq i, j \leq p\)

Computation method: dense_batch

The method computes value of objective function, its gradient or hessian for the dense data. This is the default and the only method supported.

Programming Interface

Refer to API Reference: LogisticLoss.

Distributed mode

Currently algorithm does not support distributed execution in SMPD mode.