# Logistic Loss¶

LogisticLoss is a common objective function used for binary classification.

 Operation Computational methods dense_batch dense_batch

## Mathematical formulation¶

### Computing¶

Algorithm takes dataset $$X = \{ x_1, \ldots, x_n \}$$ with $$n$$ feature vectors of dimension $$p$$, vector with correct class labels $$y = \{ y_1, \ldots, y_n \}$$ and coefficients vector $$w = \{ w_0, \ldots, w_p \}$$ of size $$p + 1$$ as input. Then it calculates logistic loss, its gradient or gradient using the following formulas.

#### Value¶

$$L(X, w, y) = \sum_{i = 1}^{n} -y_i \log(prob_i) - (1 - y_i) \log(prob_i)$$, where $$prob_i = \sigma(w_0 + \sum_{j=1}^{p} w_j x_{i, j})$$ - predicted probabilities, $$\sigma(x) = \frac{1}{1 + \exp(-x)}$$ - sigmoid function. Note that probabilities are binded to interval $$[\epsilon, 1 - \epsilon]$$ to avoid problems with computing log function ($$\epsilon=10^{-7}$$ if float type is used and $$10^{-15}$$ otherwise)

$$\overline{grad} = \frac{\partial L}{\partial w}$$, where $$\overline{grad}_0 = \sum_{i=1}^{n} prob_i - y_i$$, $$\overline{grad}_j = \sum_{i=1}^n X_{i, j} (prob_i - y_i) + L1 \cdot |w_j| + 2 \cdot L2 w_j$$ for $$1 \leq j \leq p$$

#### Hessian¶

$$H = (h_{ij}) = \frac{\partial L}{\partial w \partial w}$$, where $$h_{0,0}= \sum_{k=1}^n prob_k (1 - prob_k)$$, $$h_{i,0} = h_{0,i} = \sum_{k=1}^n X_{k,i} \cdot prob_k (1 - prob_k)$$, $$h_{i,j} = \sum_{k=1}^n X_{k,i} X_{k,j} \cdot prob_k (1 - prob_k) + [i = j] 2 \cdot L2$$ for $$1 \leq i, j \leq p$$

### Computation method: dense_batch¶

The method computes value of objective function, its gradient or hessian for the dense data. This is the default and the only method supported.

## Programming Interface¶

Refer to API Reference: LogisticLoss.

## Distributed mode¶

Currently algorithm does not support distributed execution in SMPD mode.