Implicit Alternating Least Squares¶
The library provides the Implicit Alternating Least Squares (implicit ALS) algorithm [Fleischer2008], based on collaborative filtering.
Details¶
Given the input dataset \(R=\left\{{r}_{ui}\right\}\) of size \(m \times n\), where m is the number of users and n is the number of items, the problem is to train the Alternating Least Squares (ALS) model represented as two matrices: \(X\) of size \(m \times f\), and \(Y\) of size \(f \times n\), where \(f\) is the number of factors. The matrices \(X\) and \(Y\) are the factors of low-rank factorization of matrix \(R\):
Initialization Stage¶
Initialization of the matrix Y can be done using the following method: for each \(i = 1, \ldots, n\) \({y}_{1i}=\frac{1}{m}\sum _{u=1}^{m}{r}_{ui}\) and \(y_{ki}\) are independent random numbers uniformly distributed on the interval \((0,1)\), \(k = 2, \ldots, f\).
Training Stage¶
The ALS model is trained using the implicit ALS algorithm [Hu2008] by minimizing the following cost function:
where:
\({p}_{ui}\) indicates the preference of user u of item i:
\[\begin{split}p_{ui} = \begin{cases} 1, & {r}_{ui}> \epsilon \\ 0, & {r}_{ui}\le \epsilon \end{cases}\end{split}\]\(\epsilon\) is the threshold used to define the preference values. \(\epsilon = 0\) is the only threshold valu supported so far.
\({c}_{ui}=1+\alpha r_{ui}\), \(c_{ui}\) measures the confidence in observing \(p_{ui}\)
\(\alpha\) is the rate of confidence
\(r_{ui}\) is the element of the matrix \(R\)
\(\lambda\) is the parameter of the regularization
\({n}_{{x}_{u}}\), \({m}_{{y}_{i}}\) denote the number of ratings of user \(u\) and item \(i\) respectively
Prediction Stage¶
Prediction of Ratings
Given the trained ALS model and the matrix \(D\) that describes for which pairs of factors \(X\) and \(Y\) the rating should be computed, the system calculates the matrix of recommended ratings Res: \({res}_{ui}=\sum _{j=1}^{f}{x}_{uj}{y}_{ji}\), if \({d}_{ui}\ne 0\), \(u=1,\ldots,m\); \(i=1,\ldots n\).
Initialization¶
For initialization, the following computation modes are available:
Computation¶
The following computation modes are available:
Distributed processing for training and prediction of ratings
Examples¶
Batch Processing:
Distributed Processing:
Performance Considerations¶
To get the best overall performance of the implicit ALS recommender:
If input data is homogeneous, provide the input data and store results in homogeneous numeric tables of the same type as specified in the algorithmFPType class template parameter.
If input data is sparse, use CSR numeric tables.
Product and Performance Information |
---|
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex. Notice revision #20201201 |