Logistic Loss

Logistic loss is an objective function being minimized in the process of logistic regression training when a dependent variable takes only one of two values, \(0\) and \(1\).

Details

Given \(n\) feature vectors \(X = \{x_1 = (x_{11}, \ldots, x_{1p}), \ldots, x_n = (x_{n1}, \ldots, x_{np}) \}\) of \(n\) \(p\)-dimensional feature vectors, a vector of class labels \(y = (y_1, \ldots, y_n)\), where \(y_i \in \{0, 1\}\) describes the class to which the feature vector \(x_i\) belongs, the logistic loss objective function \(K(\theta, X, y)\) has the following format \(K(\theta, X, y) = F(\theta, X, y) + M(\theta)\), where

  • \(F(\theta, X, y)\) is defined as

    \[F(\theta, X, y) = -\frac{1}{n} \sum_{i=1}^{n} \left(y_i \ln \left( \frac{1}{1 + e^{-(\theta_0 + \sum_{j=1}^{p}\theta_j x_{ij})}} \right) + (1 - y_i) \ln \left( \frac{1}{1 + e^{-(\theta_0 + \sum_{j=1}^{p}\theta_j x_{ij})}} \right) \right) + \lambda_2 \sum_{j=1}^{p} \theta_j^2\]

    with \(\sigma(x, \theta) = \frac{1}{1 + e^{-f(z, \theta)}}\), \(f(z, \theta) = \theta_0 + \sum_{k=1}^{p} \theta_k z_k\), \(\lambda_1 \geq 0\), \(\lambda_2 \geq 0\)

  • \(M(\theta) = \lambda_1 \sum_{j=1}^{p} |\theta_j|\)

For a given set of the indices \(I = \{i_1, i_2, \ldots, i_m \}\), \(1 \leq i_r \leq n\), \(r \in \{1, \ldots, m \}\):

  • The value of the sum of functions has the format:

    \[F_I(\theta, X, y) = -\frac{1}{m} \sum_{i \in I} \left( y_i \ln \sigma(x_i, \theta) + (1 - y_i) \ln (1 - \sigma(x_i, \theta)) \right) + \lambda_2 \sum_{k=1}^{p} \theta_k^2\]
  • The gradient of the sum of functions has the format:

    \[\nabla F_I(\theta, x, y) = \left\{ \frac{\partial F_I}{\partial \theta_0}, \ldots, \frac{\partial F_I}{\partial \theta_p} \right\},\]

    where

    \[\frac{\partial F_I}{\partial \theta_0} = \frac{1}{m} \sum_{i \in I} (\sigma(x_i, \theta) - y_i) + 2 \lambda_2 \theta_0, \frac{\partial F_I}{\partial \theta_p} = \frac{1}{m} \sum_{i \in I} (\sigma(x_i, \theta) - y_i) x_{ij} + 2 \lambda_2 \theta_j, j = 1, \ldots, p\]

\(\mathrm{prox}_\gamma^M (\theta_j) = \begin{cases} \theta_J - \lambda_1 \gamma, & \theta_j > \lambda_1 \gamma\\ 0, & |\theta_j| \leq \lambda_1 \gamma\\ \theta_j + \lambda_1 \gamma, & \theta_j < - \lambda_1 \gamma \end{cases}\)

\(lipschitzConstant = \underset{i = 1, \ldots, n} \max \| x_i \|_2 + \frac{\lambda_2}{n}\)

For more details, see [Hastie2009].

Computation

Algorithm Input

The logistic loss algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.

Algorithm Input for Logitic Loss Computation

Input ID

Input

argument

A numeric table of size \((p + 1) \times 1\) with the input argument \(\theta\) of the objective function.

Note

The sizes of the argument, gradient, and hessian numeric tables do not depend on interceptFlag. When interceptFlag is set to false, the computation of \(\theta_0\) value is skipped, but the sizes of the tables should remain the same.

data

A numeric table of size \(n \times p\) with the data \(x_ij\).

Note

This parameter can be an object of any class derived from NumericTable.

dependentVariables

A numeric table of size \(n \times 1\) with dependent variables \(y_i\).

Note

This parameter can be an object of any class derived from NumericTable, except for PackedTriangularMatrix , PackedSymmetricMatrix , and CSRNumericTable.

Algorithm Parameters

The logistic loss algorithm has the following parameters. Some of them are required only for specific values of the computation method’s parameter method:

Algorithm Parameters for Logitic Loss Computation

Parameter

Default value

Description

algorithmFPType

float

The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

method

defaultDense

Performance-oriented computation method.

numberOfTerms

Not applicable

The number of terms in the objective function.

batchIndices

Not applicable

The numeric table of size \(1 \times m\), where \(m\) is the batch size, with a batch of indices to be used to compute the function results. If no indices are provided, the implementation uses all the terms in the computation.

Note

This parameter can be an object of any class derived from NumericTable except PackedTriangularMatrix and PackedSymmetricMatrix .

resultsToCompute

gradient

The 64-bit integer flag that specifies which characteristics of the objective function to compute.

Provide one of the following values to request a single characteristic or use bitwise OR to request a combination of the characteristics:

value

Value of the objective function

nonSmoothTermValue

Value of non-smooth term of the objective function

gradient

Gradient of the smooth term of the objective function

hessian

Hessian of smooth term of the objective function

proximalProjection

Projection of proximal operator for non-smooth term of the objective function

lipschitzConstant

Lipschitz constant of the smooth term of the objective function

interceptFlag

true

A flag that indicates a need to compute \(\theta_{0j}\).

penaltyL1

\(0\)

L1 regularization coefficient

penaltyL2

\(0\)

L2 regularization coefficient

Algorithm Output

For the output of the logistic loss algorithm, see Output for objective functions.

Examples