Execution Model

The execution model is based upon the SYCL* execution model. It defines and specifies how code, termed kernels, execute on the host and the devices. We explain the execution model in 2 parts, application execution model and kernel execution model.

Application Execution Model

The application execution model coordinates execution and data management between the host and devices via command groups. The command groups, which are groupings of commands like kernel invocation and accessors, are submitted to queues for execution. Accessors, which are formally part of the memory model, also communicate ordering requirements of execution. A program employing the execution model declares and instantiates sycl::queues. Queues can execute with an in-order or out-of-order policy controllable by the program.

Kernel Execution Model

The device execution model specifies how computation is accomplished on the accelerator. Compute ranging from small one-dimensional data to large multidimensional data sets are allocated across a hierarchy of nd-range, work-group, sub-group, and work-item, which are all specified when the work is submitted to the command queue. It is important to note that the actual kernel code represents the work that is executed for one work-item. The code outside of the kernel controls just how much parallelism is executed; the amount and distribution of the work is controlled by specification of the sizes of the ND-range and work-group.

The following figure depicts the relationship between an ND-range, work-group, sub-group, and work-item. The total amount of work is specified by the ND-range size. The grouping of the work is specified by the work-group size. The example shows the ND-range size of X * Y * Z, work-group size of X’ * Y’ * Z’, and subgroup size of X’. Therefore, there are X * Y * Z work-items. There are (X * Y * Z) / (X’ * Y’ * Z’) work-groups and (X * Y * Z) / X’ subgroups.


When kernels are executed, the location of a particular work-item in the larger ND-range, work-group, or sub-group is important. For example, if the work-item is assigned to compute on specific pieces of data, a method of specification is necessary. Unique identification of the work-item is provided via intrinsic functions such as those in the nd_item class (global_id, work_group_id, and local_id).

The following code sample launches a kernel and displays the relationships of the previously discussed ND-range, work-group, and work-item.

 1#include <CL/sycl.hpp>
 2#include <iostream>
 3#include <iomanip>
 5const int N = 6;
 6const int M = 2;
 8using namespace sycl;
10int main() {
11  queue q;
12  buffer<int,2> buf(range<2>(N,N));
14  q.submit([&](handler &h){
15      auto bufacc = buf.get_access<access::mode::read_write>(h);
16      h.parallel_for(nd_range<2>(range<2>(N,N), range<2>(M,M)),
17		     [=](nd_item<2> item){
18		       int i = item.get_global_id(0);
19		       int j = item.get_global_id(1);
20		       bufacc[i][j] = i + j;
21		     });
22    });
24  auto bufacc1 = buf.get_access<access::mode::read>();
25  for(int i = 0; i < N; i++){
26    for(int j = 0; j < N; j++)
27      std::cout << std::setw(10) << bufacc1[i][j] << "  ";
28    std::cout<<"\n";
29  }
30  return 0;

With the following output:

         0           1           2           3           4           5  
         1           2           3           4           5           6  
         2           3           4           5           6           7  
         3           4           5           6           7           8  
         4           5           6           7           8           9  
         5           6           7           8           9          10  

ND-Range Parallelism Example

The following discusses the relationships in the use of the ND-range in the previous code sample.

  • Line 16 is the nd-range declaration. nd_range<2> specifies a two-dimensional index space.

  • The first argument, range<2>(N,N), defines a N by N global index space shape.

  • The second argument, range<2>(M,M) defines a M by M local work-group shape.

  • Lines 18 & 19 extract the coordinates of the work item in the index space

The sub-group is an extension to the SYCL execution model and sits hierarchically between the work_group and work_item. The sub_group was created to align with typical hardware resources that contain a vector unit to execute several similar operations in parallel and in lock step.