Logistic Regression

This chapter describes the Logistic Regression algorithm implemented in oneDAL.

The Logistic Regression algorithm solves the classification problem and predicts class labels and probabilities of objects belonging to each class.

Operation

Computational methods

Programming Interface

Training

dense_batch

train(…)

train_input

train_result

Inference

dense_batch

infer(…)

infer_input

infer_result

Mathematical Formulation

Training

Given \(n\) feature vectors \(X=\{x_1=(x_{11},\ldots,x_{1p}),\ldots, x_n=(x_{n1},\ldots,x_{np})\}\) of size \(p\) and \(n\) responses \(Y=\{y_1,\ldots,y_n\} \in \{0,1\}\), the problem is to fit the model weights \(w=\{w_0, \ldots, w_p\}\) to minimize Logistic Loss \(L(X, w, y) = \sum_{i = 1}^{n} -y_i \log(prob_i) - (1 - y_i) \log(prob_i)\). Where * \(prob_i = \sigma(w_0 + \sum_{j=1}^{p} w_j x_{i, j})\) - predicted probabilities, * \(\sigma(x) = \frac{1}{1 + \exp(-x)}\) - a sigmoid function. Note that probabilities are binded to interval \([\epsilon, 1 - \epsilon]\) to avoid problems with computing log function (\(\epsilon=10^{-7}\) if float type is used and \(10^{-15}\) otherwise)

Note

The probabilities are constrained to the interval \([\epsilon, 1 - \epsilon]\) to prevent issues when computing the logarithm function. Where \(\epsilon=10^{-7}\) for float type and \(10^{-15}\) otherwise.

Training Method: dense_batch

Since Logistic Loss is a convex function, you can use one of the iterative solvers designed for convex problems for minimization. During training, the data is divided into batches, and the gradients from each batch are summed up.

Refer to Mathematical formulation: Newton-CG.

Training Method: sparse

Using this method you can train Logistic Regression model on sparse data. All you need is to provide matrix with feature vectors as sparse table. Find more info about sparse tables here Compressed Sparse Rows (CSR) Table:.

Inference

Given \(r\) feature vectors \(X=\{x_1=(x_{11},\ldots,x_{1p}),\ldots, x_r=(x_{r1},\ldots,x_{rp})\}\) of size \(p\), the problem is to calculate the probabilities of associated with these feature vectors belonging to each class and determine the most probable class label for each object.

The probabilities are calculated using this formula \(prob_i = \sigma(w_0 + \sum_{j=1}^{p} w_j x_{i, j})\). Where \(\sigma(x) = \frac{1}{1 + \exp(-x)}\) is a sigmoid function. If the probability is bigger than \(0.5\) then class label is set to \(1\), otherwise to \(0\).

Programming Interface

Refer to API Reference: Logistic Regression.

Examples: Logistic Regression

oneAPI DPC++

Batch Processing: