.. ******************************************************************************
.. * Copyright 2020-2021 Intel Corporation
.. *
.. * Licensed under the Apache License, Version 2.0 (the "License");
.. * you may not use this file except in compliance with the License.
.. * You may obtain a copy of the License at
.. *
.. * http://www.apache.org/licenses/LICENSE-2.0
.. *
.. * Unless required by applicable law or agreed to in writing, software
.. * distributed under the License is distributed on an "AS IS" BASIS,
.. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
.. * See the License for the specific language governing permissions and
.. * limitations under the License.
.. *******************************************************************************/
.. _cda_solver:
Coordinate Descent Algorithm
============================
The Coordinate Descent algorithm follows the :ref:`algorithmic framework of iterative solver ` with one exception:
the default method (``defaultDense``) of Coordinate Descent algorithm is a case of the iterative solver method
with the batch equal to the number of observations in the training data set.
Details
*******
The aet of intrinsic parameters :math:`S_t` is empty.
Algorithmic-specific transformation :math:`T`, algorithm-specific vector :math:`U`,
and power :math:`d` of `Lebesgue space `_ [Adams2003]_ are defined as follows:
.. math::
T(\theta_{t-1}, F'(\theta_{t-1}), S_{t-1}, M(\theta_{t-1}))
#. Define the index :math:`j` to update the component of a coefficient as a remainder in the division of the number of current iteration (:math:`t`)
by the number of features in the training data set (:math:`p`): :math:`j = \mathrm{mod}(t, p)`
Alternatively, if ``selection`` parameter was set to ``random``, generate :math:`j` randomly.
#. If ``stepLengthSequence`` was not provided by the user, compute the learning rate: :math:`\eta = (F''(\theta_{t-1}))_{jj}`
(the diagonal element of the Hessian matrix)
#. Update the :math:`j`-th component of vector :math:`\theta`:
.. math::
(\theta_t)_j = \mathrm{prox}_{\frac{1}{\eta}}^{M}
\left( (\theta_{t-1})_j - \frac{1}{\max(\eta, \mathrm{eps})} (F'(\theta_{t-1}))_j\right)
Note: for example, if a non-smooth term :math:`M = \lambda \sum_{i=1}^{p} |\theta_t|`,
where :math:`p` is the number of features in the training data set, the objective function should compute prox operator as follows:
.. math::
\mathrm{prox}_{\frac{1}{\eta}}^{M} \left( (\theta_{t-1})_j \right) =
\begin{cases}
(\theta_{t-1})_j - \lambda \frac{1}{\eta}, & (\theta_{t-1})_j > \lambda \frac{1}{\eta}\\
0, & |(\theta_{t-1})_j| \leq \lambda \frac{1}{\eta}\\
(\theta_{t-1})_j + \lambda \frac{1}{\eta}, & (\theta_{t-1})_j < -\lambda \frac{1}{\eta}
\end{cases}
Convergence check is performed each :math:`p` iterations:
- :math:`U = \theta_t - \theta_{t - \mathrm{nFeatures}}`, :math:`d = \infty`
- For :math:`x \in R^p`, the infinity norm (:math:`d = \infty`) is defined as follows:
.. math::
|x|_{\infty} = \underset{i \in [0, p]} \max(|x_i|)
Computation
***********
Coordinate Descent algorithm is a special case of an iterative solver.
For parameters, input, and output of iterative solvers, see :ref:`Iterative Solver > Computation `.
Algorithm Parameters
--------------------
In addition to the input of a iterative solver, Coordinate Descent algorithm accepts the following parameters:
.. tabularcolumns:: |\Y{0.15}|\Y{0.15}|\Y{0.7}|
.. list-table:: Algorithm Parameters for Coordinate Descent Computaion
:widths: 10 10 60
:header-rows: 1
:align: left
:class: longtable
* - Parameter
- Default Value
- Description
* - ``algorithmFPType``
- ``float``
- The floating-point type that the algorithm uses for intermediate computations. Can be ``float`` or ``double``.
* - ``method``
- ``defaultDense``
- Performance-oriented method.
* - ``engine``
- `SharePtr< engines:: mt19937:: Batch>()`
- Pointer to the random number generator engine that is used internally during each iteration
to choose a random component of the minimum result vector to be updated.
* - ``positive``
- ``false``
- A boolean value. When set to ``true``, it forces the coefficients to be positive.
* - ``selection``
- ``cyclic``
- Value that specifies the strategy of certain coordinate selection on each iteration.
Except for default ``cyclic`` value, Coordinate Descent also supports:
- ``random`` – on each iteration the index of coordinate is selected randomly by the engine.
* - ``skipTheFirstComponents``
- ``false``
- A boolean value. When set to ``true``, Coordinate Descent algorithm will skip the first component from optimization.
Examples
********
.. tabs::
.. tab:: C++ (CPU)
- :cpp_example:`cd_dense_batch.cpp `
.. tab:: Java*
.. note:: There is no support for Java on GPU.
- :java_example:`CDDenseBatch.java `