Coordinate Descent Algorithm¶

The Coordinate Descent algorithm follows the algorithmic framework of iterative solver with one exception: the default method (defaultDense) of Coordinate Descent algorithm is a case of the iterative solver method with the batch equal to the number of observations in the training data set.

Details¶

The set of intrinsic parameters \(S_t\) is empty. Algorithmic-specific transformation \(T\), algorithm-specific vector \(U\), and power \(d\) of Lebesgue space [Adams2003] are defined as follows:

\[T(\theta_{t-1}, F'(\theta_{t-1}), S_{t-1}, M(\theta_{t-1}))\]

Define the index \(j\) to update the component of a coefficient as a remainder in the division of the number of current iteration (\(t\)) by the number of features in the training data set (\(p\)): \(j = \mathrm{mod}(t, p)\)

Alternatively, if selection parameter was set to random, generate \(j\) randomly.
If stepLengthSequence was not provided by the user, compute the learning rate: \(\eta = (F''(\theta_{t-1}))_{jj}\) (the diagonal element of the Hessian matrix)
Update the \(j\)-th component of vector \(\theta\):

\[(\theta_t)_j = \mathrm{prox}_{\frac{1}{\eta}}^{M} \left( (\theta_{t-1})_j - \frac{1}{\max(\eta, \mathrm{eps})} (F'(\theta_{t-1}))_j\right)\]

Note: for example, if a non-smooth term \(M = \lambda \sum_{i=1}^{p} |\theta_t|\), where \(p\) is the number of features in the training data set, the objective function should compute prox operator as follows:

\[\begin{split}\mathrm{prox}_{\frac{1}{\eta}}^{M} \left( (\theta_{t-1})_j \right) = \begin{cases} (\theta_{t-1})_j - \lambda \frac{1}{\eta}, & (\theta_{t-1})_j > \lambda \frac{1}{\eta}\\ 0, & |(\theta_{t-1})_j| \leq \lambda \frac{1}{\eta}\\ (\theta_{t-1})_j + \lambda \frac{1}{\eta}, & (\theta_{t-1})_j < -\lambda \frac{1}{\eta} \end{cases}\end{split}\]

Convergence check is performed each \(p\) iterations:

\(U = \theta_t - \theta_{t - \mathrm{nFeatures}}\), \(d = \infty\)
For \(x \in R^p\), the infinity norm (\(d = \infty\)) is defined as follows:

\[|x|_{\infty} = \underset{i \in [0, p]} \max(|x_i|)\]

Computation¶

Coordinate Descent algorithm is a special case of an iterative solver. For parameters, input, and output of iterative solvers, see Iterative Solver > Computation.

Algorithm Parameters¶

In addition to the input of a iterative solver, Coordinate Descent algorithm accepts the following parameters:

Algorithm Parameters for Coordinate Descent Computation¶
Parameter	Default Value	Description
`algorithmFPType`	`float`	The floating-point type that the algorithm uses for intermediate computations. Can be `float` or `double`.
`method`	`defaultDense`	Performance-oriented method.
`engine`	SharePtr< engines:: mt19937:: Batch>()	Pointer to the random number generator engine that is used internally during each iteration to choose a random component of the minimum result vector to be updated.
`positive`	`false`	A boolean value. When set to `true`, it forces the coefficients to be positive.
`selection`	`cyclic`	Value that specifies the strategy of certain coordinate selection on each iteration. Except for default `cyclic` value, Coordinate Descent also supports: `random` – on each iteration the index of coordinate is selected randomly by the engine.
`skipTheFirstComponents`	`false`	A boolean value. When set to `true`, Coordinate Descent algorithm will skip the first component from optimization.

Examples¶

cd_dense_batch.cpp

Adaptive Subgradient Method

Stochastic Average Gradient Accelerated Method

oneDAL documentation

Coordinate Descent Algorithm¶

Details¶

Computation¶

Algorithm Parameters¶

Examples¶