Support Vector Machine Classifier and Regression (SVM)

Support Vector Machine (SVM) classification and regression are among popular algorithms. It belongs to a family of generalized linear classification problems.


Computational methods

Programming Interface













Mathematical formulation


Given \(n\) feature vectors \(X=\{x_1=(x_{11},\ldots,x_{1p}),\ldots, x_n=(x_{n1},\ldots,x_{np})\}\) of size \(p\), their non-negative observation weights \(W=\{w_1,\ldots,w_n\}\), and \(n\) responses \(Y=\{y_1,\ldots,y_n\}\),

  • \(y_i \in \{0, \ldots, C-1\}\), where \(C\) is the number of classes

  • \(y_i \in \mathbb{R}\)

the problem is to build a Support Vector Machine (SVM) classification or regression model.

The SVM model is trained using the Sequential minimal optimization (SMO) method [Boser92]} for reduced to the solution of the quadratic optimization problem

\[\underset{\alpha }{\mathrm{min}}\frac{1}{2}{\alpha }^{T}Q\alpha -{e}^{T}\alpha\]

with \(0 \leq \alpha_i \leq C\), \(i = 1, \ldots, n\), \(y^T \alpha = 0\), where \(e\) is the vector of ones, \(C\) is the upper bound of the coordinates of the vector \(\alpha\), \(Q\) is a symmetric matrix of size \(n \times n\) with \(Q_{ij} = y_i y_j K(x_i, x_j)\), and \(K(x,y)\) is a kernel function.

\[\underset{\alpha }{\mathrm{min}}\frac{1}{2}{\alpha }^{T}Q\alpha -{s}^{T}\alpha\]

with \(0 \leq \alpha_i \leq C\), \(i = 1, \ldots, 2n\), \(z^T \alpha = 0\), where \(C\) is the upper bound of the coordinates of the vector \(\alpha\), \(Q\) is a symmetric matrix of size \(2n \times 2n\) with \(Q_{ij} = y_i y_j K(x_i, x_j)\), and \(K(x,y)\) is a kernel function. Vectors \(s\) and \(z\) for the regression problem are formulated according to the following rule:

\[\begin{split}\begin{cases} z_i = +1, s_i = \epsilon - y_i, & i \leq n \\ z_i = -1, s_i = \epsilon + y_i, & n < i \leq 2n \end{cases}\end{split}\]

Where \(\epsilon\) is the error tolerance parameter.

Working subset of α updated on each iteration of the algorithm is based on the Working Set Selection (WSS) 3 scheme [Fan05]. The scheme can be optimized using one of these techniques or both:

  • Cache: the implementation can allocate a predefined amount of memory to store intermediate results of the kernel computation.

  • Shrinking: the implementation can try to decrease the amount of kernel related computations (see [Joachims99]).

The solution of the problem defines the separating hyperplane and corresponding decision function \(D(x)= \sum_{k} {y_k \alpha_k K(x_k, x)} + b\), where only those \(x_k\) that correspond to non-zero \(\alpha_k\) appear in the sum, and \(b\) is a bias. Each non-zero \(\alpha_k\) is called a dual coefficient and the corresponding \(x_k\) is called a support vector.

Training method: smo

In smo training method, all vectors from the training dataset are used for each iteration.

Training method: thunder

In thunder training method, the algorithm iteratively solves the convex optimization problem with the linear constraints by selecting the fixed set of active constrains (working set) and applying Sequential Minimal Optimization (SMO) solver to the selected subproblem. The description of this method is given in Algorithm [Wen2018].

Inference methods: smo and thunder

smo and thunder inference methods perform prediction in the same way:

Given the SVM classification or regression model and \(r\) feature vectors \(x_1, \ldots, x_r\), the problem is to calculate the signed value of the decision function \(D(x_i)\), \(i=1, \ldots, r\). The sign of the value defines the class of the feature vector, and the absolute value of the function is a multiple of the distance between the feature vector and the separating hyperplane.


Batch Processing: