Writing Intel® SHMEM Programs

Writing Intel® SHMEM Programs#

Intel® SHMEM Programs require including the following header files:

#include <CL/sycl.hpp>
#include <ishmem.h>

Here is how to initialize the ishmem library with an OpenSHMEM runtime:

ishmem_init();

Now we can query for the PE identifier and total number of PEs:

int my_pe = ishmem_my_pe();
int npes = ishmem_n_pes();

std::cout << "Hello from PE " << my_pe << std::endl;

To perform ishmem operations, we must first allocate some symmetric objects:

int *src = (int *) ishmem_malloc(array_size * sizeof(int));
int *dst = (int *) ishmem_calloc(array_size, sizeof(int));

Now let’s initialize these source and destination buffers from within a parallel SYCL kernel:

auto e_init = q.submit([&](sycl::handler &h) {
    h.parallel_for(sycl::nd_range<1>{array_size, array_size}, [=](sycl::nd_item<1> idx) {
        int i = idx.get_global_id()[0];
        src[i] = (my_pe << 16) + i;
        dst[i] = (my_pe << 16) + 0xface;
    });
});
e_init.wait_and_throw();

Now we must perform a barrier operation to assure that all the source data is initialized before doing any communication:

ishmem_barrier_all();

Let’s perform a simple ring-style communication pattern; that is, have each PE send its source data to the subsequent PE (the PE with the largest identifier value will send to PE 0):

/* Perform put operation */
auto e1 = q.submit([&](sycl::handler &h) {
    h.single_task([=]() {
        int my_dev_pe = ishmem_my_pe();
        int my_dev_npes = ishmem_n_pes();

        ishmem_int_put(dst, src, array_size, (my_dev_pe + 1) % my_dev_npes);
    });
});
e1.wait_and_throw();

Before verifying the correct results, we need to perform another barrier operation, to assure all the communication is complete:

ishmem_barrier_all();

int *errors = (int *) sycl::malloc_host<int>(1, q);
*errors = 0;

/* Verify data */
auto e_verify = q.submit([&](sycl::handler &h) {
    h.single_task([=]() {
        for (int i = 0; i < array_size; ++i) {
            if (dst[i] != (((my_pe + 1) % npes) << 16) + i) {
                *errors = *errors + 1;
            }
        }
    });
});
e_verify.wait_and_throw();

if (*errors > 0) {
    std::cerr << "[ERROR] Validation check(s) failed: " << *errors << std::endl;
}

Finally, we can free all allocated memory and finalize the library. For symmetric ishmem objects, we must call ishmem_free:

ishmem_free(source);
ishmem_free(target);
sycl::free(errors, q);

ishmem_finalize();

For an overview of more APIs and how they are used in applications, the Intel® SHMEM examples provide an excellent resource.