Elastic Net#

Elastic Net is a method for modeling relationship between a dependent variable (which may be a vector) and one or more explanatory variables by fitting regularized least squares model. Elastic Net regression model has the special penalty, a sum of L1 and L2 regularizations, that takes advantage of both Ridge Regression and LASSO algorithms. This penalty is particularly useful in a situation with many correlated predictor variables [Friedman2010].


Let \((x_1, \ldots, x_p)\) be a vector of input variables and \(y = (y_1, \ldots, y_k)\) be the response. For each \(j = 1, \ldots, k\), the Elastic Net model has the form similar to linear and ridge regression models [Hoerl70] with one exception: the coefficients are estimated by minimizing mean squared error (MSE) objective function that is regularized by \(L_1\) and \(L_2\) penalties.

\[y_j = \beta_{0j} + x_1 \beta_{1j} + \ldots + x_p \beta_{pj}\]

Here \(x_i\), \(i = 1, \ldots, p\), are referred to as independent variables, \(y_j\), \(j = 1, \ldots, k\), is referred to as dependent variable or response.

Training Stage#

Let \((x_{11}, \ldots, x_{1p}, y_{11}, \ldots, y_{1k}) \ldots (x_{n1}, \ldots, x_{np}, y_{n1}, \ldots, y_{nk})\) be a set of training data (for regression task, \(n >> p\), and for feature selection \(p\) could be greater than \(n\)). The matrix \(X\) of size \(n \times p\) contains observations \(x_{ij}\), \(i = 1, \ldots, n\), \(j = 1, \ldots, p\) of independent variables.

For each \(y_j\), \(j = 1, \ldots, k\), the Elastic Net regression estimates \((\beta_{0j}, \beta_{1j}, \ldots, \beta_{pj})\) by minimizing the objective function:

\[F_j(\beta) = \frac{1}{2n} \sum_{i=1}^{n}(y_{ij} - \beta_{0j} - \sum_{q=1}^{p}{\beta_{qj}x_{iq})^2} + \lambda_{1j} \sum_{q=1}^{p}|\beta_{qj}| + \lambda_{2j} \frac{1}{2}\sum_{q=1}^{p}\beta_{qj}^{2}\]

In the equation above, the first term is a mean squared error function, the second and the third are regularization terms that penalize the \(L_1\) and \(L_2\) norms of vector \(\beta_j\), where \(\lambda_{1j} \geq 0\), \(\lambda_{2j} \geq 0\), \(j = 1, \ldots, k\).

For more details, see [Hastie2009] and [Friedman2010].

By default, Coordinate Descent iterative solver is used to minimize the objective function. SAGA solver is also applicable for minimization.

Prediction Stage#

Prediction based on Elastic Net regression is done for input vector \((x_1, \ldots, x_p)\) using the equation \(y_j = \beta_{0j} + x_1 \beta_{1j} + \ldots + x_p \beta_{pj}\) for each \(j = 1, \ldots, k\).