Coordinate Descent Algorithm¶
The Coordinate Descent algorithm follows the algorithmic framework of iterative solver with one exception:
the default method (defaultDense
) of Coordinate Descent algorithm is a case of the iterative solver method
with the batch equal to the number of observations in the training data set.
Details¶
The set of intrinsic parameters \(S_t\) is empty. Algorithmic-specific transformation \(T\), algorithm-specific vector \(U\), and power \(d\) of Lebesgue space [Adams2003] are defined as follows:
Define the index \(j\) to update the component of a coefficient as a remainder in the division of the number of current iteration (\(t\)) by the number of features in the training data set (\(p\)): \(j = \mathrm{mod}(t, p)\)
Alternatively, if
selection
parameter was set torandom
, generate \(j\) randomly.If
stepLengthSequence
was not provided by the user, compute the learning rate: \(\eta = (F''(\theta_{t-1}))_{jj}\) (the diagonal element of the Hessian matrix)Update the \(j\)-th component of vector \(\theta\):
\[(\theta_t)_j = \mathrm{prox}_{\frac{1}{\eta}}^{M} \left( (\theta_{t-1})_j - \frac{1}{\max(\eta, \mathrm{eps})} (F'(\theta_{t-1}))_j\right)\]Note: for example, if a non-smooth term \(M = \lambda \sum_{i=1}^{p} |\theta_t|\), where \(p\) is the number of features in the training data set, the objective function should compute prox operator as follows:
\[\begin{split}\mathrm{prox}_{\frac{1}{\eta}}^{M} \left( (\theta_{t-1})_j \right) = \begin{cases} (\theta_{t-1})_j - \lambda \frac{1}{\eta}, & (\theta_{t-1})_j > \lambda \frac{1}{\eta}\\ 0, & |(\theta_{t-1})_j| \leq \lambda \frac{1}{\eta}\\ (\theta_{t-1})_j + \lambda \frac{1}{\eta}, & (\theta_{t-1})_j < -\lambda \frac{1}{\eta} \end{cases}\end{split}\]
Convergence check is performed each \(p\) iterations:
\(U = \theta_t - \theta_{t - \mathrm{nFeatures}}\), \(d = \infty\)
For \(x \in R^p\), the infinity norm (\(d = \infty\)) is defined as follows:
Computation¶
Coordinate Descent algorithm is a special case of an iterative solver. For parameters, input, and output of iterative solvers, see Iterative Solver > Computation.
Algorithm Parameters¶
In addition to the input of a iterative solver, Coordinate Descent algorithm accepts the following parameters:
Parameter |
Default Value |
Description |
---|---|---|
|
|
The floating-point type that the algorithm uses for intermediate computations. Can be |
|
|
Performance-oriented method. |
|
SharePtr< engines:: mt19937:: Batch>() |
Pointer to the random number generator engine that is used internally during each iteration to choose a random component of the minimum result vector to be updated. |
|
|
A boolean value. When set to |
|
|
Value that specifies the strategy of certain coordinate selection on each iteration.
Except for default
|
|
|
A boolean value. When set to |