Coordinate Descent Algorithm

The Coordinate Descent algorithm follows the algorithmic framework of iterative solver with one exception: the default method (defaultDense) of Coordinate Descent algorithm is a case of the iterative solver method with the batch equal to the number of observations in the training data set.

Details

The set of intrinsic parameters \(S_t\) is empty. Algorithmic-specific transformation \(T\), algorithm-specific vector \(U\), and power \(d\) of Lebesgue space [Adams2003] are defined as follows:

\[T(\theta_{t-1}, F'(\theta_{t-1}), S_{t-1}, M(\theta_{t-1}))\]
  1. Define the index \(j\) to update the component of a coefficient as a remainder in the division of the number of current iteration (\(t\)) by the number of features in the training data set (\(p\)): \(j = \mathrm{mod}(t, p)\)

    Alternatively, if selection parameter was set to random, generate \(j\) randomly.

  2. If stepLengthSequence was not provided by the user, compute the learning rate: \(\eta = (F''(\theta_{t-1}))_{jj}\) (the diagonal element of the Hessian matrix)

  3. Update the \(j\)-th component of vector \(\theta\):

    \[(\theta_t)_j = \mathrm{prox}_{\frac{1}{\eta}}^{M} \left( (\theta_{t-1})_j - \frac{1}{\max(\eta, \mathrm{eps})} (F'(\theta_{t-1}))_j\right)\]

    Note: for example, if a non-smooth term \(M = \lambda \sum_{i=1}^{p} |\theta_t|\), where \(p\) is the number of features in the training data set, the objective function should compute prox operator as follows:

    \[\begin{split}\mathrm{prox}_{\frac{1}{\eta}}^{M} \left( (\theta_{t-1})_j \right) = \begin{cases} (\theta_{t-1})_j - \lambda \frac{1}{\eta}, & (\theta_{t-1})_j > \lambda \frac{1}{\eta}\\ 0, & |(\theta_{t-1})_j| \leq \lambda \frac{1}{\eta}\\ (\theta_{t-1})_j + \lambda \frac{1}{\eta}, & (\theta_{t-1})_j < -\lambda \frac{1}{\eta} \end{cases}\end{split}\]

Convergence check is performed each \(p\) iterations:

  • \(U = \theta_t - \theta_{t - \mathrm{nFeatures}}\), \(d = \infty\)

  • For \(x \in R^p\), the infinity norm (\(d = \infty\)) is defined as follows:

\[|x|_{\infty} = \underset{i \in [0, p]} \max(|x_i|)\]

Computation

Coordinate Descent algorithm is a special case of an iterative solver. For parameters, input, and output of iterative solvers, see Iterative Solver > Computation.

Algorithm Parameters

In addition to the input of a iterative solver, Coordinate Descent algorithm accepts the following parameters:

Algorithm Parameters for Coordinate Descent Computation

Parameter

Default Value

Description

algorithmFPType

float

The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

method

defaultDense

Performance-oriented method.

engine

SharePtr< engines:: mt19937:: Batch>()

Pointer to the random number generator engine that is used internally during each iteration to choose a random component of the minimum result vector to be updated.

positive

false

A boolean value. When set to true, it forces the coefficients to be positive.

selection

cyclic

Value that specifies the strategy of certain coordinate selection on each iteration. Except for default cyclic value, Coordinate Descent also supports:

  • random – on each iteration the index of coordinate is selected randomly by the engine.

skipTheFirstComponents

false

A boolean value. When set to true, Coordinate Descent algorithm will skip the first component from optimization.

Examples