Partitioner Summary#
The parallel loop templates parallel_for
and parallel_reduce
take an optional partitioner argument, which specifies a strategy for
executing the loop. The following table summarizes partitioners and
their effect when used in conjunction with blocked_range
.
Partitioner |
Description |
When Used with |
---|---|---|
|
Chunksize bounded by grain size. |
|
|
Automatic chunk size. |
|
|
Automatic chunk size, cache affinity and uniform distribution of iterations. |
|
|
Deterministic chunk size, cache affinity and uniform distribution of iterations without load balancing. |
|
An auto_partitioner
is used when no partitioner is specified. In
general, the auto_partitioner
or affinity_partitioner
should be
used, because these tailor the number of chunks based on available
execution resources. affinity_partitioner
and static_partitioner
may take advantage of Range
ability to split in a given ratio (see
“Advanced Topic: Other Kinds of Iteration Spaces”) for distributing
iterations in nearly equal chunks between computing resources.
simple_partitioner
can be useful in the following situations:
The subrange size for
operator()
must not exceed a limit. That might be advantageous, for example, if youroperator()
needs a temporary array proportional to the size of the range. With a limited subrange size, you can use an automatic variable for the array instead of having to use dynamic memory allocation.A large subrange might use cache inefficiently. For example, suppose the processing of a subrange involves repeated sweeps over the same memory locations. Keeping the subrange below a limit might enable the repeatedly referenced memory locations to fit in cache.
You want to tune to a specific machine.