# Elastic Net¶

Elastic Net is a method for modeling relationship between a dependent variable (which may be a vector) and one or more explanatory variables by fitting regularized least squares model. Elastic Net regression model has the special penalty, a sum of L1 and L2 regularizations, that takes advantage of both Ridge Regression and LASSO algorithms. This penalty is particularly useful in a situation with many correlated predictor variables [Friedman2010].

## Details¶

Let $$(x_1, \ldots, x_p)$$ be a vector of input variables and $$y = (y_1, \ldots, y_k)$$ be the response. For each $$j = 1, \ldots, k$$, the Elastic Net model has the form similar to linear and ridge regression models [Hoerl70] with one exception: the coefficients are estimated by minimizing mean squared error (MSE) objective function that is regularized by $$L_1$$ and $$L_2$$ penalties.

$y_j = \beta_{0j} + x_1 \beta_{1j} + \ldots + x_p \beta_{pj}$

Here $$x_i$$, $$i = 1, \ldots, p$$, are referred to as independent variables, $$y_j$$, $$j = 1, \ldots, k$$, is referred to as dependent variable or response.

### Training Stage¶

Let $$(x_{11}, \ldots, x_{1p}, y_{11}, \ldots, y_{1k}) \ldots (x_{n1}, \ldots, x_{np}, y_{n1}, \ldots, y_{nk})$$ be a set of training data (for regression task, $$n >> p$$, and for feature selection $$p$$ could be greater than $$n$$). The matrix $$X$$ of size $$n \times p$$ contains observations $$x_{ij}$$, $$i = 1, \ldots, n$$, $$j = 1, \ldots, p$$ of independent variables.

For each $$y_j$$, $$j = 1, \ldots, k$$, the Elastic Net regression estimates $$(\beta_{0j}, \beta_{1j}, \ldots, \beta_{pj})$$ by minimizing the objective function:

$F_j(\beta) = \frac{1}{2n} \sum_{i=1}^{n}(y_{ij} - \beta_{0j} - \sum_{q=1}^{p}{\beta_{qj}x_{iq})^2} + \lambda_{1j} \sum_{q=1}^{p}|\beta_{qj}| + \lambda_{2j} \frac{1}{2}\sum_{q=1}^{p}\beta_{qj}^{2}$

In the equation above, the first term is a mean squared error function, the second and the third are regularization terms that penalize the $$L_1$$ and $$L_2$$ norms of vector $$\beta_j$$, where $$\lambda_{1j} \geq 0$$, $$\lambda_{2j} \geq 0$$, $$j = 1, \ldots, k$$.

For more details, see [Hastie2009] and [Friedman2010].

By default, Coordinate Descent iterative solver is used to minimize the objective function. SAGA solver is also applicable for minimization.

### Prediction Stage¶

Prediction based on Elastic Net regression is done for input vector $$(x_1, \ldots, x_p)$$ using the equation $$y_j = \beta_{0j} + x_1 \beta_{1j} + \ldots + x_p \beta_{pj}$$ for each $$j = 1, \ldots, k$$.