Distributed Processing¶
The distributed processing mode assumes that the data set R is split in nblocks
blocks across computation nodes.
Parameters¶
In the distributed processing mode, initialization of item factors for the implicit ALS algorithm has the following parameters:
Parameter |
Default Value |
Description |
---|---|---|
|
|
The floating-point type that the algorithm uses for intermediate computations. Can be |
|
|
Performance-oriented computation method for CSR numeric tables, the only method supported by the algorithm. |
|
\(10\) |
The total number of factors. |
|
\(0\) |
The total number of users \(m\). |
|
Not applicable |
A numeric table of size either \(1 \times 1\) that provides the number of input data parts or \((\mathrm{nblocks} + 1) \times 1\),
where |
|
SharePtr< engines:: mt19937:: Batch>() |
Pointer to the random number generator engine that is used internally at the initialization step. |
To initialize the implicit ALS algorithm in the distributed processing mode, use the one-step process illustrated by the following diagram for \(\mathrm{nblocks} = 3\):
Step 1 - on Local Nodes¶
Input¶
In the distributed processing mode, initialization of item factors for the implicit ALS algorithm accepts the input described below.
Pass the Input ID
as a parameter to the methods that provide input for your algorithm.
For more details, see Algorithms.
Input ID |
Input |
---|---|
|
An \(n_i \times m\) numeric table with the part of the input data set. Each node holds \(n_i\) rows of the full transposed input data set \(R^T\). The input should be an object of |
Output¶
In the distributed processing mode, initialization of item factors for the implicit ALS algorithm calculates the results described below.
Pass the Partial Result ID
as a parameter to the methods that access the results of your algorithm.
Partial results that correspond to the outputOfInitForComputeStep3
and offsets
Partial Result IDs
should be transferred to Step 3 of the distributed ALS training algorithm.
Output of Initialization for Computing Step 3 (outputOfInitForComputeStep3
) is a key-value data collection
that maps components of the partial model on the \(i\)-th node to all local nodes.
Keys in this data collection are indices of the nodes and the value that corresponds to each key \(i\)
is a numeric table that contains indices of the factors of the items to be transferred to the \(i\)-th node
on Step 3 of the distributed ALS training algorithm.
User Offsets (offsets
) is a key-value data collection,
where the keys are indices of the nodes and the value that correspond to the key \(i\) is a numeric table of size \(1 \times 1\)
that contains the value of the starting offset of the user factors stored on the \(i\)-th node.
For more details, see Algorithms.
Partial Result ID |
Result |
---|---|
|
The model with initialized item factors. The result can only be an object of the |
|
A key-value data collection that maps components of the partial model to the local nodes. |
|
A key-value data collection of size |
|
A key-value data collection of size |
Step 2 - on Local Nodes¶
Input¶
This step uses the results of the previous step.
Input ID |
Input |
---|---|
|
A key-value data collection of size nblocks that contains the parts of the input data set: \(i\) -th element of this collection is a numeric table of size \(m_i \times n_i\). Each numeric table in the collection should be an object of CSRNumericTable class. |
Output¶
In this step, implicit ALS initialization calculates the partial results described below.
Pass the Partial Result ID
as a parameter to the methods that access the results of your algorithm.
Partial results that correspond to the outputOfInitForComputeStep3
and offsets
Partial Result IDs
should be transferred to Step 3 of the distributed ALS training algorithm.
Output of Initialization for Computing Step 3 (outputOfInitForComputeStep3
) is a key-value data collection
that maps components of the partial model on the \(i\)-th node to all local nodes.
Keys in this data collection are indices of the nodes and the value that corresponds to each key i
is a numeric table that contains indices of the user factors to be transferred to the i-th node
on Step 3 of the distributed ALS training algorithm.
Item Offsets (offsets
) is a key-value data collection,
where the keys are indices of the nodes and the value that correspond to the key \(i\) is a numeric table of size \(1 \times 1\)
that contains the value of the starting offset of the item factors stored on the \(i\)-th node.
For more details, see Algorithms.
Partial Result ID |
Result |
---|---|
|
An \(m_j \times n\) numeric table with the mining data. \(j\)-th node gets \(m_j\) rows of the full input data set \(R\). |
|
A key-value data collection that maps components of the partial model to the local nodes. |
|
A key-value data collection of size |