Elastic Net

Elastic Net is a method for modeling relationship between a dependent variable (which may be a vector) and one or more explanatory variables by fitting regularized least squares model. Elastic Net regression model has the special penalty, a sum of L1 and L2 regularizations, that takes advantage of both Ridge Regression and LASSO algorithms. This penalty is particularly useful in a situation with many correlated predictor variables [Friedman2010].


Let \((x_1, \ldots, x_p)\) be a vector of input variables and \(y = (y_1, \ldots, y_k)\) be the response. For each \(j = 1, \ldots, k\), the Elastic Net model has the form similar to linear and ridge regression models [Hoerl70] with one exception: the coefficients are estimated by minimizing mean squared error (MSE) objective function that is regularized by \(L_1\) and \(L_2\) penalties.

\[y_j = \beta_{0j} + x_1 \beta_{1j} + \ldots + x_p \beta_{pj}\]

Here \(x_i\), \(i = 1, \ldots, p\), are referred to as independent variables, \(y_j\), \(j = 1, \ldots, k\), is referred to as dependent variable or response.

Training Stage

Let \((x_{11}, \ldots, x_{1p}, y_{11}, \ldots, y_{1k}) \ldots (x_{n1}, \ldots, x_{np}, y_{n1}, \ldots, y_{nk})\) be a set of training data (for regression task, \(n >> p\), and for feature selection \(p\) could be greater than \(n\)). The matrix \(X\) of size \(n \times p\) contains observations \(x_{ij}\), \(i = 1, \ldots, n\), \(j = 1, \ldots, p\) of independent variables.

For each \(y_j\), \(j = 1, \ldots, k\), the Elastic Net regression estimates \((\beta_{0j}, \beta_{1j}, \ldots, \beta_{pj})\) by minimizing the objective function:

\[F_j(\beta) = \frac{1}{2n} \sum_{i=1}^{n}(y_{ij} - \beta_{0j} - \sum_{q=1}^{p}{\beta_{qj}x_{iq})^2} + \lambda_{1j} \sum_{q=1}^{p}|\beta_{qj}| + \lambda_{2j} \frac{1}{2}\sum_{q=1}^{p}\beta_{qj}^{2}\]

In the equation above, the first term is a mean squared error function, the second and the third are regularization terms that penalize the \(L_1\) and \(L_2\) norms of vector \(\beta_j\), where \(\lambda_{1j} \geq 0\), \(\lambda_{2j} \geq 0\), \(j = 1, \ldots, k\).

For more details, see [Hastie2009] and [Friedman2010].

By default, Coordinate Descent iterative solver is used to minimize the objective function. SAGA solver is also applicable for minimization.

Prediction Stage

Prediction based on Elastic Net regression is done for input vector \((x_1, \ldots, x_p)\) using the equation \(y_j = \beta_{0j} + x_1 \beta_{1j} + \ldots + x_p \beta_{pj}\) for each \(j = 1, \ldots, k\).