Memory Model#

_images/memory_model-1PPD.png

An Intel® SHMEM program consists of data objects that are private to each PE and data objects that are remotely accessible by all PEs. Private data objects are stored in the local host memory of a given PE or in the device memory of the device local to the PE, and private objects cannot be accessed by other PEs via ishmem routines. Private data objects on the host follow the memory model of C/C++ and private data objects on the device follow the SYCL device memory model (see SYCL device memory model).

Remotely accessible objects, however, can be accessed by remote PEs using ishmem routines within SYCL device kernels. Remotely accessible data objects are called Symmetric Data Objects. Each symmetric data object has a corresponding object with the same name, type, and size on all PEs where that object is accessible via the Intel® SHMEM API.

In Intel® SHMEM, global and static C++ variables are remotely accessible only by using ishmem routines that are callable from the host (this kind of data object must not be defined in a dynamic shared object). On the other hand, C++ data allocated by the memory management routines (Section Memory Management) are remotely accessible by using ishmem APIs that are callable from either the host or the device.

The ishmem dynamic memory allocation routines (e.g., ishmem_malloc) allow collective allocation of Symmetric Data Objects on a special memory region called the Symmetric Heap. The Symmetric Heap is created during the execution of a program at a memory location determined by the implementation. The Symmetric Heap may reside in different memory regions on different PEs. The memory model diagram shows an example of the Intel® SHMEM memory layout, illustrating the location of remotely accessible symmetric objects and private data objects.

Pointers to Symmetric Objects#

Symmetric data objects are referenced in ishmem routines through the local pointer to the desired remotely accessible object. The address contained in this pointer is referred to as a symmetric address. Every symmetric address is also a local address that is valid for direct memory access; however, not all local addresses are symmetric. Manipulation of symmetric addresses passed to ishmem routines - including pointer arithmetic, array indexing, and access of structure or union members - are permitted as long as the resulting local pointer remains within the same symmetric allocation or object. Symmetric addresses are only valid at the PE where they were generated; using a symmetric address generated by a different PE for direct memory access or as an argument to a ishmem routine results in undefined behavior.

Symmetric addresses provided to typed ishmem interfaces must be naturally aligned based on their type and any requirements of the underlying architecture.

The DPC++/SYCL programming model supports Unified Shared Memory between host and GPU. In that model, memory can be host, device, or shared. In Intel® SHMEM, the symmetric heap is in device memory and is not accessible from the host by loads and stores, but SYCL routines such as sycl::queue::memcpy() may be used. SYCL host Memory may be allocated by sycl::malloc_host() and is accesible using loads and stores in both host and device kernel code. SYCL kernels generally do not have access to host global and static objects.

Intel® SHMEM augments the DPC++/SYCL memory model to some degree, in that host-initiated operations can reference objects in the symmetric heap on the same or other PEs, even though device memory is not accessible by loads and stores. Host-initiated operations can also refer to global and static objects on the same or other PEs. Device-initiated operations may only reference objects in the symmetric heap.

The ishmem_ptr routine allows the programmer to obtain a local pointer to a remotely accessible object at a specified PE if load-store access is possible. For Intel® \(\text{X}^e\) Link fabric connected GPUs, ishmem_ptr may return a valid address for direct access to symmetric heap objects that are accessible via the \(\text{X}^e\) Link fabric. ishmem_ptr called on the host will generally not return a valid address for a symmetric heap object even on the same PE. A pointer returned by ishmem_optr is valid for direct memory access; however, providing this address as an argument of an ishmem routine that requires a symmetric address results in undefined behavior.

Atomicity Guarantees#

Intel® SHMEM contains a number of routines that perform atomic operations on symmetric data objects, which are defined in the Atomic Memory Operations section. The atomic routines guarantee that concurrent accesses by any of these routines to the same location, using the same datatype (specified in the Standard AMO Types and Extended AMO Types tables) will be exclusive. Exclusivity is also guaranteed when the target PE performs a wait or test operation on the same location and with the same datatype as one or more atomic operations.

Intel® SHMEM atomic operations do not guarantee exclusivity in the following scenarios, all of which result in undefined behavior.

  1. When concurrent accesses to the same location are performed using ishmem atomic routines using different datatypes.

  2. When atomic and non-atomic ishmem routines are used to access the same location concurrently.

  3. When ishmem atomic routines and non-ishmem operations (e.g., load, store, or other atomic operations) are used to access the same location concurrently.