.. ****************************************************************************** .. * Copyright 2020-2021 Intel Corporation .. * .. * Licensed under the Apache License, Version 2.0 (the "License"); .. * you may not use this file except in compliance with the License. .. * You may obtain a copy of the License at .. * .. * http://www.apache.org/licenses/LICENSE-2.0 .. * .. * Unless required by applicable law or agreed to in writing, software .. * distributed under the License is distributed on an "AS IS" BASIS, .. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. .. * See the License for the specific language governing permissions and .. * limitations under the License. .. *******************************************************************************/ .. _saga_solver: Stochastic Average Gradient Accelerated Method ============================================== The Stochastic Average Gradient Accelerated (SAGA) [Defazio2014]_ follows :ref:the algorithmic framework of an iterative solver  with one exception. The default method (defaultDense) of SAGA algorithm is a particular case of the iterative solver method with the batch size :math:b = 1. Details ******* Algorithmic-specific transformation :math:T, the set of intrinsic parameters :math:S_t defined for the learning rate :math:\eta, and algorithm-specific vector :math:U and power :math:d of Lebesgue space _ are defined as follows: .. math:: S_t = \{ G^t \} .. math:: G^t = (G_i^t)_{i = 1, \ldots, n} .. math:: G^0 \equiv (G_i^0)_{i = 1, \ldots, n} \equiv F_i'(\theta_0)_{i = 1, \ldots, n} :math:S_t is a matrix of the gradients of smooth terms at point :math:\theta_t, where - :math:t is defined by the number of iterations the solver runs - :math:G_i^t stores the gradient of :math:f_i(\theta_t) :math:T(\theta_{t-1}, F_j'(\theta_{t-1}), S_{t-1}, M(\theta_{t-1})): #. :math:W_t = \theta_{t-1} - \eta_j \left[ F_j'(\theta_{t-1}) - G_j^{t-1} + \frac{1}{n} \sum_{i=1}^{n} G_i^{t-1}\right] #. :math:\theta_t = \mathrm{prox}_{\eta}^{M} (W_t) Update of the set of intrinsic parameters :math:S_t: .. math:: G_j^{t-1} = F_j'(\theta_{t-1}) .. note:: The algorithm enables automatic step-length selection if learning rate :math:\eta was not provided by the user. Automatic step-length will be computed as :math:\eta = \frac{1}{L}, where :math:L is the Lipschitz constant returned by objective function. If the objective function returns nullptr to numeric table with lipschitzConstant Result ID, the library will use default step size :math:0.01. Convergence checks: - :math:U = \theta_t - \theta_{t - 1}, :math:d = \infty - :math:|x|_{\infty} = \underset{i \in [0, p]} \max(|x^i|), :math:x \in R^p Computation *********** The stochastic average gradient (SAGA) algorithm is a special case of an iterative solver. For parameters, input, and output of iterative solvers, see :ref:Iterative Solver > Computation . Algorithm Input --------------- In addition to the :ref:input of the iterative solver , the SAGA optimization solver has the following optional input: .. tabularcolumns:: |\Y{0.15}|\Y{0.15}|\Y{0.7}| .. list-table:: Algorithm Input for Stochastic Average Gradient Accelerated Method Computaion :widths: 10 10 60 :align: left * - OptionalDataID - Default Value - Description * - gradientTable - Not applicable - A numeric table of size :math:n \times p which represents :math:G_0 matrix that contains gradients of :math:F_i(\theta), :math:1, \ldots, n at the initial point :math:\theta_0 \in R^p. This input is optional: if the user does not provide the table of gradients for :math:F_i(\theta), :math:1, \ldots, n, the library will compute it inside the SAGA algorithm. .. note:: This parameter can be an object of any class derived from NumericTable, except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable. Algorithm Parameters -------------------- In addition to parameters of the iterative solver, the SAGA optimization solver has the following parameters: .. tabularcolumns:: |\Y{0.15}|\Y{0.15}|\Y{0.7}| .. list-table:: Algorithm Parameters for Stochastic Average Gradient Accelerated Method Computaion :widths: 10 10 60 :header-rows: 1 :align: left :class: longtable * - Parameter - Default Value - Description * - algorithmFPType - float - The floating-point type that the algorithm uses for intermediate computations. Can be float or double. * - method - defaultDense - Performance-oriented method. * - batchIndices - :math:1 - A numeric table of size :math:\mathrm{nIterations} \times 1 with 32-bit integer indices of terms in the objective function. If no indices are provided, the implementation generates random index on each iteration. .. note:: This parameter can be an object of any class derived from NumericTable, except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable. * - learningRateSequence - Not applicable - The numeric table of size :math:1 \times \mathrm{nIterations} or :math:1 \times 1 that contains learning rate for each iterations is first case, otherwise constant step length will be used for all iterations. It is recommended to set diminishing learning rate sequence. If learningRateSequence is not provided, the learning rate will be computed automatically via constantOfLipschitz Result ID. .. note:: This parameter can be an object of any class derived from NumericTable, except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable. * - engine - SharedPtr - Pointer to the random number generator engine that is used internally for generation of 32-bit integer index of term in the objective function. Algorithm Output ---------------- In addition to the :ref:output of the iterative solver , the SAGA optimization solver calculates the following optional result: .. tabularcolumns:: |\Y{0.15}|\Y{0.15}|\Y{0.7}| .. list-table:: Algorithm Output for Stochastic Average Gradient Accelerated Method Computaion :widths: 10 10 60 :align: left * - OptionalDataID - Default Value - Description * - gradientTable - Not applicable - A numeric table of size :math:n \times p that represents matrix :math:G_t updated after all iterations. This parameter can be an object of any class derived from NumericTable, except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable. Examples ******** .. tabs:: .. tab:: C++ (CPU) Batch Processing: - :cpp_example:saga_dense_batch.cpp  - :cpp_example:saga_logistic_loss_dense_batch.cpp  .. tab:: Java* .. note:: There is no support for Java on GPU. Batch Processing: - :java_example:SAGADenseBatch.java  - :java_example:SAGALogisticLossDenseBatch.java  .. tab:: Python* Batch Processing: - :daal4py_example:saga_batch.py