Collective Operations

Collective Operations#

Collective routines are defined as coordinated communication or synchronization operations performed by a group of PEs.

Intel® SHMEM provides two types of collective routines:

Collective routines that operate on teams using a team handle parameter to determine which PEs will participate in the routine, and use resources encapsulated by the team object to perform operations. See Section Team Management Routines for details on team management.
Collective routines that do not accept a team handle argument nor active set parameters, which implicitly operate on the world team, ISHMEM_TEAM_WORLD.

Concurrent accesses to symmetric memory by an Intel® SHMEM collective routine and any other means of access—where at least one PE or a thread within a PE updates the symmetric memory—results in undefined behavior. Since PEs can enter and exit collectives at different times, accessing such memory remotely may require additional synchronization.

Important

All collective operations must complete before another SYCL kernel calls collective operations.

Important

A collective call must be either all host-initiated or device-initiated. For example, a program that initiates a collective operation from the host on some PEs but from the device on other PEs has undefined behavior. Furthermore, each PE initiating a collective must use the same variant of the collective API. That is, mixed use of the on_queue, workgroup, and base variant collectives is undefined behavior.

ISHMEM_BARRIER_ALL#

Registers the arrival of a PE at a barrier and blocks the PE until all other PEs arrive at the barrier and all local updates and remote memory updates are completed.

void ishmem_barrier_all(void)#

Parameters:: None.
Returns:: None.

Callable from the host and device.

Description: The ishmem_barrier_all routine is a mechanism for synchronizing all PEs in the world team at once. This routine blocks the calling PE until all PEs in the world team have called ishmem_barrier_all. In a multithreaded Intel® SHMEM program, only the calling thread is blocked, however, it may not be called concurrently by multiple threads in the same PE.

Prior to synchronizing with other PEs, ishmem_barrier_all ensures completion of all previously issued memory stores, and of all local and remote memory updates issued via ishmem AMO and RMA routine calls such as ishmem_int_add, ishmem_put_nbi, and ishmem_get_nbi.

A host-initiated ishmem_barrier_all will only guarantee completion of device-initiated operations for which the corresponding SYCL kernel has completed execution.

ISHMEMX_BARRIER_ALL_ON_QUEUE#

Registers the arrival of a PE at a barrier and blocks the PE until all other PEs arrive at the barrier and all local updates and remote memory updates are completed.

sycl::event ishmemx_barrier_all_on_queue(const sycl::queue &q, const std::vector<sycl::event> &deps)#

Parameters:

q – The SYCL queue on which to execute the operation. q must be mapped to the GPU tile assigned to the calling PE.
deps – An optional vector of SYCL events that the operation depends on.

Returns:

The SYCL event created upon submitting the operation to the SYCL runtime.

Callable from the host.

Description: The ishmemx_barrier_all_on_queue routine is a mechanism for synchronizing all PEs.

To ensure the barrier has completed, refer to the on_queue API Completion Semantics section.

ISHMEMX_BARRIER_ALL_WORK_GROUP#

Registers the arrival of a PE at a barrier and blocks the PE until all other PEs arrive at the barrier and all local updates and remote memory updates are completed.

template<typename Group> void ishmemx_barrier_all_work_group(const Group &group)#

Parameters:: group – The SYCL group or sub_group on which to collectively perform the barrier operation.
Returns:: None.

Callable from the device.

Description: The ishmemx_barrier_all_work_group routine is a mechanism for synchronizing all PEs. Unlike ishmem_barrier_all, ishmemx_barrier_all_work_group allows for the device threads within group to cooperate towards the barrier operation. This may be more performant; for example, when ishmem_barrier_all requires all device threads in the kernel to invoke RMA operations. This routine blocks the calling PE until all PEs in the world team have called ishmemx_barrier_all_work_group. All threads in group must call the routine with identical arguments.

ISHMEM_SYNC_ALL#

Registers the arrival of a PE at a synchronization point and suspends execution until all other PEs arrive at the synchronization point.

void ishmem_sync_all(void)#

Parameters:: None.
Returns:: None.

Callable from the host and the device.

Description: This routine blocks the calling PE until all PEs in the world team have called ishmem_sync_all. In a multithreaded Intel® SHMEM program, only the calling thread is blocked.

In contrast with the ishmem_barrier_all routines, ishmem_sync_all only ensures completion and visibility of previously issued memory stores and does not ensure completion of remote memory updates issued via ishmem routines.

ISHMEMX_SYNC_ALL_ON_QUEUE#

Registers the arrival of a PE at a synchronization point and suspends execution until all other PEs arrive at the synchronization point.

sycl::event ishmemx_sync_all_on_queue(sycl::queue &q, const std::vector<sycl::event> &deps)#

Parameters:

q – The SYCL queue on which to execute the operation. q must be mapped to the GPU tile assigned to the calling PE.
deps – An optional vector of SYCL events that the operation depends on.

Returns:

The SYCL event created upon submitting the operation to the SYCL runtime.

Callable from the host.

Description: This routine blocks the calling PE until all PEs in the world team have called ishmemx_sync_all_on_queue. In a multithreaded Intel® SHMEM program, only the calling thread is blocked.

In contrast with the ishmem_barrier_all routines, ishmemx_sync_all_on_queue only ensures completion and visibility of previously issued memory stores and does not ensure completion of remote memory updates issued via ishmem routines.

To ensure the sync has completed, refer to the on_queue API Completion Semantics section.

ISHMEMX_SYNC_ALL_WORK_GROUP#

Registers the arrival of a PE at a synchronization point and suspends execution until all other PEs arrive at the synchronization point.

template<typename Group> void ishmemx_sync_all_work_group(const Group &group)#

Parameters:: group – The SYCL group or sub_group on which to collectively perform the barrier operation.
Returns:: None.

Callable from the device.

Description: This routine blocks the calling PE until all PEs in the world team have called ishmemx_sync_all_work_group. In a multithreaded Intel® SHMEM program, only the calling thread is blocked.

In contrast with the ishmem_sync_all routine, ishmemx_sync_all_work_group allows for the device threads within group to cooperate towards the sync operation. This may be more performant; for example, when ishmem_sync_all requires all device threads in the kernel to invoke RMA operations. ishmemx_sync_all_work_group only ensures completion and visibility of previously issued memory stores and does not ensure completion of remote memory updates issued via ishmem routines. All threads in group must call the routine with identical arguments.

ISHMEM_TEAM_SYNC#

Registers the arrival of a PE at a synchronization point and suspends execution until all other PEs in a given ishmem team arrive at this synchronization point.

int ishmem_team_sync(ishmem_team_t team)#

Parameters:: team – The team over which to perform the operation.
Returns:: Zero on successful local completion. Nonzero otherwise.

Callable from the host and the device.

Description: ishmem_team_sync is a collective synchronization routine over an existing ishmem team. The routine registers the arrival of a PE at a synchronization point in the program. This is a fast mechanism for synchronizing all PEs that participate in this collective call. The routine blocks the calling PE until all PEs in the specified team have called ishmem_team_sync. In a multithreaded Intel® SHMEM program, only the calling thread is blocked.

All PEs in the provided team must participate in the sync operation. If team compares equal to ISHMEM_TEAM_INVALID or is otherwise invalid, the behavior is undefined. If a PE not in the team calls ishmem_team_sync, the behavior is undefined.

In contrast with the ishmem_barrier_all routine, ishmem_team_sync only ensures completion and visibility of previously issued memory stores and does not ensure completion of remote memory updates issued via ishmem routines.

ISHMEMX_TEAM_SYNC_ON_QUEUE#

Registers the arrival of a PE at a synchronization point and suspends execution until all other PEs in a given ishmem team arrive at this synchronization point.

sycl::event ishmemx_team_sync_on_queue(ishmem_team_t team, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

Parameters:

team – The team over which to perform the operation.
ret – A pointer whose contents will be set to zero on successful local completion; otherwise, nonzero. ret must be accessible from both the host and the device.
q – The SYCL queue on which to execute the operation. q must be mapped to the GPU tile assigned to the calling PE.
deps – An optional vector of SYCL events that the operation depends on.

Returns:

The SYCL event created upon submitting the operation to the SYCL runtime.

Callable from the host.

Description: ishmemx_team_sync_on_queue is a collective synchronization routine over an existing ishmem team. The routine registers the arrival of a PE at a synchronization point in the program. This is a fast mechanism for synchronizing all PEs that participate in this collective call. The routine blocks the calling PE until all PEs in the specified team have called ishmemx_team_sync_on_queue. In a multithreaded Intel® SHMEM program, only the calling thread is blocked.

In contrast with the ishmem_barrier_all routine, ishmemx_team_sync_on_queue only ensures completion and visibility of previously issued memory stores and does not ensure completion of remote memory updates issued via ishmem routines.

To ensure the contents of ret are valid, refer to the on_queue API Completion Semantics section.

ISHMEMX_TEAM_SYNC_WORK_GROUP#

Registers the arrival of a PE at a synchronization point and suspends execution until all other PEs arrive at the synchronization point.

template<typename Group> void ishmemx_team_sync_work_group(ishmem_team_t team, const Group &group)#

Parameters:

team – The team over which to perform the operation.
group – The SYCL group or sub_group on which to collectively perform the barrier operation.

Callable from the device.

Description: This routine blocks the calling PE until all PEs in team have called ishmemx_team_sync_work_group. In a multithreaded Intel® SHMEM program, only the calling thread is blocked.

In contrast with the ishmem_team_sync routine, ishmemx_team_sync_work_group allows for the device threads within group to cooperate towards the sync operation. This may be more performant; for example, when ishmem_team_sync requires all device threads in the kernel to invoke RMA operations. All PEs in the provided team must participate in the sync operation. If team compares equal to ISHMEM_TEAM_INVALID or is otherwise invalid, the behavior is undefined. ishmemx_team_sync_work_group only ensures completion and visibility of previously issued memory stores and does not ensure completion of remote memory updates issued via ishmem routines. All threads in group must call the routine with identical arguments.

ISHMEM_ALLTOALL#

Exchanges a fixed amount of contiguous data blocks between all pairs of PEs participating in the collective routine.

template<typename TYPE> int ishmem_alltoall(TYPE *dest, const TYPE *source, size_t nelems)#

template<typename TYPE> int ishmem_alltoall(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems)#

int ishmem_TYPENAME_alltoall(TYPE *dest, const TYPE *source, size_t nelems)#

int ishmem_TYPENAME_alltoall(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems)#

int ishmem_alltoallmem(void *dest, const void *source, size_t nelems)#

int ishmem_alltoallmem(ishmem_team_t team, void *dest, const void *source, size_t nelems)#

Parameters:

dest – Symmetric address of a data object large enough to receive the combined total of nelems elements from each PE. The type of dest should match the TYPE and TYPENAME according to the table of Standard RMA types.
source – Symmetric address of a data object that contains nelems elements of data for each PE, ordered according to destination PE. The type of source should match the TYPE and TYPENAME according to the table of Standard RMA types.
nelems – The number of elements to exchange for each PE. For ishmem_alltoallmem, elements are bytes.
team – A valid ishmem team handle to a team.

Returns:

Zero on successful local completion; otherwise, nonzero.

Callable from the host and device.

Description: The ishmem_alltoall routines are collective routines. Each PE participating in the operation exchanges nelems data elements with all other PEs participating in the operation. The size of a data element is 8 bits for ishmem_alltoallmem.

The data being sent and received are stored in a contiguous symmetric data object. The total size of each PE’s source object and dest object is nelems times the size of an element times N, where N equals the number of PEs participating in the operation. The source object contains N blocks of data (where the size of each block is defined by nelems) and each block of data is sent to a different PE.

The same dest and source arrays, and same value for nelems must be passed by all PEs that participate in the collective.

Given a PE i that is the i^th PE participating in the operation and a PE j that is the j^th PE participating in the operation, PE i sends the j^th block of its source object to the i^th block of the dest object of PE j.

If no team argument is passed to ishmem_alltoall or ishmem_alltoallmem, all PEs in the world team must participate in the collective. Collective routines that accept a team argument operate over all PEs in the provided team. All PEs in the provided team must participate in the collective. If team compares equal to ISHMEM_TEAM_INVALID or is otherwise invalid, the behavior is undefined.

Before any PE calls a ishmem_alltoall routine, the following conditions must be ensured:

The dest data object on all PEs in the team is ready to accept the ishmem_alltoall data.
The source data object on all PEs in the team is ready to send.

Otherwise, the behavior is undefined.

Upon return from a ishmem_alltoall routine, the following is true for the local PE:

Its dest symmetric data object is completely updated.
The data has been copied out of the source data object.

ISHMEMX_ALLTOALL_ON_QUEUE#

Exchanges a fixed amount of contiguous data blocks between all pairs of PEs participating in the collective routine.

In the functions below, TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Standard RMA Types.

template<typename TYPE> sycl::event ishmemx_alltoall_on_queue(TYPE *dest, const TYPE *source, size_t nelems, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

template<typename TYPE> sycl::event ishmemx_alltoall_on_queue(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_TYPENAME_alltoall_on_queue(TYPE *dest, const TYPE *source, size_t nelems, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_TYPENAME_alltoall_on_queue(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_alltoallmem_on_queue(void *dest, const void *source, size_t nelems, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_alltoallmem_on_queue(ishmem_team_t team, void *dest, const void *source, size_t nelems, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

Parameters:

dest – Symmetric address of a data object large enough to receive the combined total of nelems elements from each PE. The type of dest should match the TYPE and TYPENAME according to the table of Standard RMA types.
source – Symmetric address of a data object that contains nelems elements of data for each PE, ordered according to destination PE. The type of source should match the TYPE and TYPENAME according to the table of Standard RMA types.
nelems – The number of elements to exchange for each PE. For ishmemx_alltoallmem_on_queue, elements are bytes.
ret – A pointer whose contents will be set to zero on successful local completion; otherwise, nonzero. ret must be accessible from both the host and the device.
q – The SYCL queue on which to execute the operation. q must be mapped to the GPU tile assigned to the calling PE.
deps – An optional vector of SYCL events that the operation depends on.
team – A valid ishmem team handle to a team.

Returns:

The SYCL event created upon submitting the operation to the SYCL runtime.

Callable from the host.

Description: The ishmemx_alltoall_on_queue routines have similar semantics and requirements as the ishmem_alltoall routines. If no team argument is passed to ishmemx_alltoall_on_queue or ishmemx_alltoallmem_on_queue, all PEs in the world team must participate in the collective. Collective routines that accept a team argument operate over all PEs in the provided team. All PEs in the provided team must participate in the collective. If team compares equal to ISHMEM_TEAM_INVALID or is otherwise invalid, the behavior is undefined.

To ensure the contents of dest and ret are valid, refer to the on_queue API Completion Semantics section.

ISHMEMX_ALLTOALL_WORK_GROUP#

Exchanges a fixed amount of contiguous data blocks between all pairs of PEs participating in the collective routine.

In the functions below, TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Standard RMA Types.

template<typename TYPE, typename Group> int ishmemx_alltoall_work_group(TYPE *dest, const TYPE *source, size_t nelems, const Group &group)#

template<typename TYPE, typename Group> int ishmemx_alltoall_work_group(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems, const Group &group)#

template<typename Group> int ishmemx_TYPENAME_alltoall_work_group(TYPE *dest, const TYPE *source, size_t nelems, const Group &group)#

template<typename Group> int ishmemx_TYPENAME_alltoall_work_group(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems, const Group &group)#

template<typename Group> int ishmemx_alltoallmem_work_group(void *dest, const void *source, size_t nelems, const Group &group)#

template<typename Group> int ishmemx_alltoallmem_work_group(ishmem_team_t team, void *dest, const void *source, size_t nelems, const Group &group)#

Parameters:

dest – Symmetric address of a data object large enough to receive the combined total of nelems elements from each PE. The type of dest should match the TYPE and TYPENAME according to the table of Standard RMA types.
source – Symmetric address of a data object that contains nelems elements of data for each PE, ordered according to destination PE. The type of source should match the TYPE and TYPENAME according to the table of Standard RMA types.
nelems – The number of elements to exchange for each PE. For ishmem_alltoallmem, elements are bytes.
group – The SYCL group or sub_group on which to collectively perform the barrier operation.
team – A valid ishmem team handle to a team.

Returns:

Zero on successful local completion; otherwise, nonzero.

Callable from the device.

Description: The ishmemx_alltoall_work_group routines have similar semantics and requirements as the ishmem_alltoall routines. In contrast with the ishmem_alltoall routines, ishmemx_alltoall_work_group allows for the device threads within group to cooperate towards the all-to-all operation. This may be more performant; for example, when ishmem_alltoall requires all device threads in the kernel to invoke RMA operations. This routine blocks the calling PE until all PEs in the team have called ishmemx_alltoall_work_group. If no team argument is passed to ishmemx_alltoall_work_group or ishmemx_alltoallmem_work_group, all PEs in the world team must participate in the collective. Collective routines that accept a team argument operate over all PEs in the provided team. All PEs in the provided team must participate in the collective. If team compares equal to ISHMEM_TEAM_INVALID or is otherwise invalid, the behavior is undefined. ishmemx_alltoall_work_group only ensures completion and visibility of previously issued memory stores and does not ensure completion of remote memory updates issued via ishmem routines. All threads in group must call the routine with identical arguments.

ISHMEM_BROADCAST#

Broadcasts a block of data from one PE to one or more destination PEs.

Below, TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Standard RMA Types.

template<typename TYPE> int ishmem_broadcast(TYPE *dest, const TYPE *source, size_t nelems, int PE_root)#

template<typename TYPE> int ishmem_broadcast(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems, int PE_root)#

int ishmem_TYPENAME_broadcast(TYPE *dest, const TYPE *source, size_t nelems, int PE_root)#

int ishmem_TYPENAME_broadcast(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems, int PE_root)#

int ishmem_broadcastmem(void *dest, const void *source, size_t nelems, int PE_root)#

int ishmem_broadcastmem(ishmem_team_t team, void *dest, const void *source, size_t nelems, int PE_root)#

Parameters:

dest – Symmetric address of the destination data object. The type of dest should match the TYPE and TYPENAME according to the table of Standard RMA types.
source – Symmetric address of the source data object. The type of source should match the TYPE and TYPENAME according to the table of Standard RMA types.
nelems – The number of elements in the source and dest arrays. For ishmem_broadcastmem, elements are bytes.
PE_root – The PE from which the data is copied.
team – A valid ishmem team handle to a team.

Returns:

Zero on successful local completion; otherwise, nonzero.

Callable from the host and device.

Description: The broadcast routines are collective routines across all PEs in a valid ishmem team. They copy the source data object on the PE specified by PE_root to the dest data object on the PEs participating in the collective operation. The same dest and source data objects and the same value of PE_root must be passed by all PEs participating in the collective operation.

For broadcasts:

The dest object is updated on all PEs in the ishmem team.
All PEs in the team must participate in the operation.
If team compares equal to ISHMEM_TEAM_INVALID or is otherwise invalid, the behavior is undefined.
PE numbering is relative to the team. The specified root PE must be a valid PE number for the team, between \(0\) and \(N-1\), where \(N\) is the size of the team.
The values of argument PE_root must be the same value on all PEs in the team.

Before any PE calls a broadcast routine, the following conditions must be ensured:

The dest array on all PEs in the team is ready to accept the broadcast data.

Otherwise, the behavior is undefined.

Upon return from a broadcast routine, the following are true for the local PE:

The dest data object is updated on all PEs in the team.
The source data object may be safely reused.

ISHMEMX_BROADCAST_ON_QUEUE#

Broadcasts a block of data from one PE to one or more destination PEs.

Below, TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Standard RMA Types.

template<typename TYPE> sycl::event ishmemx_broadcast_on_queue(TYPE *dest, const TYPE *source, size_t nelems, int PE_root, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

template<typename TYPE> sycl::event ishmemx_broadcast_on_queue(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems, int PE_root, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_TYPENAME_broadcast_on_queue(TYPE *dest, const TYPE *source, size_t nelems, int PE_root, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_TYPENAME_broadcast_on_queue(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems, int PE_root, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_broadcastmem_on_queue(void *dest, const void *source, size_t nelems, int PE_root, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_broadcastmem_on_queue(ishmem_team_t team, void *dest, const void *source, size_t nelems, int PE_root, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

Parameters:

dest – Symmetric address of the destination data object. The type of dest should match the TYPE and TYPENAME according to the table of Standard RMA types.
source – Symmetric address of the source data object. The type of source should match the TYPE and TYPENAME according to the table of Standard RMA types.
nelems – The number of elements in the source and dest arrays. For ishmemx_broadcastmem_on_queue, elements are bytes.
PE_root – The PE from which the data is copied.
ret – A pointer whose contents will be set to zero on successful local completion; otherwise, nonzero. ret must be accessible from both the host and the device.
q – The SYCL queue on which to execute the operation. q must be mapped to the GPU tile assigned to the calling PE.
deps – An optional vector of SYCL events that the operation depends on.
team – A valid ishmem team handle to a team.

Returns:

The SYCL event created upon submitting the operation to the SYCL runtime.

Callable from the host.

Description: The ishmemx_broadcast_on_queue routines have similar semantics and requirements as the ishmem_broadcast routines. If no team argument is passed to ishmemx_broadcast_on_queue or ishmemx_broadcastmem_on_queue, all PEs in the world team must participate in the broadcast operation. Broadcast routines that accept a team argument operate over all PEs in the provided team. All PEs in the provided team must participate in the broadcast. If team compares equal to ISHMEM_TEAM_INVALID or is otherwise invalid, the behavior is undefined.

To ensure the contents of dest and ret are valid, refer to the on_queue API Completion Semantics section.

ISHMEMX_BROADCAST_WORK_GROUP#

Broadcasts a block of data from one PE to one or more destination PEs.

Below, TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Standard RMA Types.

template<typename TYPE, typename Group> int ishmemx_broadcast_work_group(TYPE *dest, const TYPE *source, size_t nelems, int PE_root, const Group &group)#

template<typename TYPE, typename Group> int ishmemx_broadcast_work_group(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems, int PE_root, const Group &group)#

template<typename Group> int ishmemx_TYPENAME_broadcast_work_group(TYPE *dest, const TYPE *source, size_t nelems, int PE_root, const Group &group)#

template<typename Group> int ishmemx_TYPENAME_broadcast_work_group(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems, int PE_root, const Group &group)#

template<typename Group> int ishmemx_broadcastmem_work_group(void *dest, const void *source, size_t nelems, int PE_root, const Group &group)#

template<typename Group> int ishmemx_broadcastmem_work_group(ishmem_team_t team, void *dest, const void *source, size_t nelems, int PE_root, const Group &group)#

Parameters:

dest – Symmetric address of the destination data object. The type of dest should match the TYPE and TYPENAME according to the table of Standard RMA types.
source – Symmetric address of the source data object. The type of source should match the TYPE and TYPENAME according to the table of Standard RMA types.
nelems – The number of elements in the source and dest arrays. For ishmemx_broadcastmem_work_group, elements are bytes.
PE_root – The PE from which the data is copied.
group – The SYCL group or sub_group on which to collectively perform the barrier operation.
team – A valid ishmem team handle to a team.

Returns:

Zero on successful local completion; otherwise, nonzero.

Callable from the device.

Description: The ishmemx_broadcast_work_group and ishmemx_broadcastmem_work_group routines have similar semantics and requirements as the ishmem_broadcast routines. In contrast with the ishmem_broadcast routines, ishmemx_broadcast_work_group and ishmemx_broadcastmem_work_group allow for the device threads within group to cooperate towards the broadcast operation. This routine blocks the calling PE until all PEs in the team have called ishmemx_broadcast_work_group. If no team argument is passed to ishmemx_broadcast_work_group or ishmemx_broadcastmem_work_group, all PEs in the world team must participate in the broadcast operation. Broadcast routines that accept a team argument operate over all PEs in the provided team. All PEs in the provided team must participate in the broadcast. If team compares equal to ISHMEM_TEAM_INVALID or is otherwise invalid, the behavior is undefined. ishmemx_broadcast_work_group only ensures completion and visibility of previously issued memory stores and does not ensure completion of remote memory updates issued via ishmem routines. All threads in group must call the routine with identical arguments.

ISHMEM_COLLECT, ISHMEM_FCOLLECT#

Concatenates blocks of data from multiple PEs to an array in every PE participating in the collective routine.

In the functions below, TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Standard RMA Types.

template<typename TYPE> int ishmem_collect(TYPE *dest, const TYPE *source, size_t nelems)#

template<typename TYPE> int ishmem_collect(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems)#

template<typename TYPE> int ishmem_fcollect(TYPE *dest, const TYPE *source, size_t nelems)#

template<typename TYPE> int ishmem_fcollect(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems)#

int ishmem_TYPENAME_collect(TYPE *dest, const TYPE *source, size_t nelems)#

int ishmem_TYPENAME_collect(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems)#

int ishmem_TYPENAME_fcollect(TYPE *dest, const TYPE *source, size_t nelems)#

int ishmem_TYPENAME_fcollect(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems)#

int ishmem_collectmem(void *dest, const void *source, size_t nelems)#

int ishmem_collectmem(ishmem_team_t team, void *dest, const void *source, size_t nelems)#

int ishmem_fcollectmem(void *dest, const void *source, size_t nelems)#

int ishmem_fcollectmem(ishmem_team_t team, void *dest, const void *source, size_t nelems)#

Parameters:

dest – Symmetric address of an array large enough to accept the concatenation of the source arrays on all participating PEs. The type of dest should match the TYPE and TYPENAME according to the table of Standard RMA types.
source – Symmetric address of the source data object. The type of source should match the TYPE and TYPENAME according to the table of Standard RMA types.
nelems – The number of elements in source array. For ishmem_collectmem and ishmem_fcollectmem, elements are bytes.
team – A valid ishmem team handle to a team.

Returns:

Zero on successful local completion. Nonzero otherwise.

Callable from the host and device.

Description: The ishmem_collect and ishmem_fcollect routines perform a collective operation to concatenate nelems data items from the source array into the dest array, over all PEs in a valid ishmem team in processor number order.

The collected result is written to the dest array for all PEs in the team. The same dest and source arrays must be passed by all PEs that participate in the operation.

The ishmem_fcollect routines require that nelems be the same value in all participating PEs, while the ishmem_collect routines allow nelems to vary from PE to PE.

If no team argument is passed to either ishmem_collect or ishmem_fcollect, then all PEs in the world team must participate in the collective. Collect and fcollect routines that accept a team argument operate over all PEs in the provided team. All PEs in the provided team must participate in the collective. If team compares equal to ISHMEM_TEAM_INVALID or is otherwise invalid, the behavior is undefined.

Upon return from a collective routine, the following are true for the local PE:

The dest array is updated and the source array may be safely reused.

ISHMEMX_COLLECT_ON_QUEUE, ISHMEMX_FCOLLECT_ON_QUEUE#

Concatenates blocks of data from multiple PEs to an array in every PE participating in the collective routine.

In the functions below, TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Standard RMA Types.

template<typename TYPE> sycl::event ishmemx_collect_on_queue(TYPE *dest, const TYPE *source, size_t nelems, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

template<typename TYPE> sycl::event ishmemx_collect_on_queue(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

template<typename TYPE> sycl::event ishmemx_fcollect_on_queue(TYPE *dest, const TYPE *source, size_t nelems, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

template<typename TYPE> sycl::event ishmemx_fcollect_on_queue(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_TYPENAME_collect_on_queue(TYPE *dest, const TYPE *source, size_t nelems, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_TYPENAME_collect_on_queue(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_TYPENAME_fcollect_on_queue(TYPE *dest, const TYPE *source, size_t nelems, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_TYPENAME_fcollect_on_queue(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_collectmem_on_queue(void *dest, const void *source, size_t nelems, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_collectmem_on_queue(ishmem_team_t team, void *dest, const void *source, size_t nelems, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_fcollectmem_on_queue(void *dest, const void *source, size_t nelems, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_fcollectmem_on_queue(ishmem_team_t team, void *dest, const void *source, size_t nelems, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

Parameters:

dest – Symmetric address of an array large enough to accept the concatenation of the source arrays on all participating PEs. The type of dest should match the TYPE and TYPENAME according to the table of Standard RMA types.
source – Symmetric address of the source data object. The type of source should match the TYPE and TYPENAME according to the table of Standard RMA types.
nelems – The number of elements in source array. For ishmemx_collectmem_on_queue and ishmemx_fcollectmem_on_queue, elements are bytes.
ret – A pointer whose contents will be set to zero on successful local completion; otherwise, nonzero. ret must be accessible from both the host and the device.
q – The SYCL queue on which to execute the operation. q must be mapped to the GPU tile assigned to the calling PE.
deps – An optional vector of SYCL events that the operation depends on.
team – A valid ishmem team handle to a team.

Returns:

The SYCL event created upon submitting the operation to the SYCL runtime.

Callable from the host.

Description: The ishmemx_collect_on_queue and ishmemx_fcollect_on_queue routines have similar semantics and requirements as the ishmem_collect and ishmem_fcollect routines, respectively. If no team argument is passed to ishmemx_collect_on_queue, ishmemx_fcollect_on_queue, or ishmemx_fcollectmem_on_queue, or ishmemx_fcollectmem_on_queue, then all PEs in the world team must participate in the collective. Collect routines that accept a team argument operate over all PEs in the provided team. All PEs in the provided team must participate in the collective. If team compares equal to ISHMEM_TEAM_INVALID or is otherwise invalid, the behavior is undefined.

To ensure the contents of dest and ret are valid, refer to the on_queue API Completion Semantics section.

ISHMEMX_COLLECT_WORK_GROUP, ISHMEMX_FCOLLECT_WORK_GROUP#

Concatenates blocks of data from multiple PEs to an array in every PE participating in the collective routine.

In the functions below, TYPE is one of the standard RMA types and has a corresponding TYPENAME specified by Table Standard RMA Types.

template<typename TYPE, typename Group> int ishmemx_collect_work_group(TYPE *dest, const TYPE *source, size_t nelems, const Group &group)#

template<typename TYPE, typename Group> int ishmemx_collect_work_group(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems, const Group &group)#

template<typename TYPE, typename Group> int ishmemx_fcollect_work_group(TYPE *dest, const TYPE *source, size_t nelems, const Group &group)#

template<typename TYPE, typename Group> int ishmemx_fcollect_work_group(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems, const Group &group)#

template<typename Group> int ishmemx_TYPENAME_collect_work_group(TYPE *dest, const TYPE *source, size_t nelems, const Group &group)#

template<typename Group> int ishmemx_TYPENAME_collect_work_group(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems, const Group &group)#

template<typename Group> int ishmemx_TYPENAME_fcollect_work_group(TYPE *dest, const TYPE *source, size_t nelems, const Group &group)#

template<typename Group> int ishmemx_TYPENAME_fcollect_work_group(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nelems, const Group &group)#

template<typename Group> int ishmemx_collectmem_work_group(void *dest, const void *source, size_t nelems, const Group &group)#

template<typename Group> int ishmemx_collectmem_work_group(ishmem_team_t team, void *dest, const void *source, size_t nelems, const Group &group)#

template<typename Group> int ishmemx_fcollectmem_work_group(void *dest, const void *source, size_t nelems, const Group &group)#

template<typename Group> int ishmemx_fcollectmem_work_group(ishmem_team_t team, void *dest, const void *source, size_t nelems, const Group &group)#

Parameters:

dest – Symmetric address of an array large enough to accept the concatenation of the source arrays on all participating PEs. The type of dest should match the TYPE and TYPENAME according to the table of Standard RMA types.
source – Symmetric address of the source data object. The type of source should match the TYPE and TYPENAME according to the table of Standard RMA types.
nelems – The number of elements in source array. For ishmemx_collectmem_work_group and ishmemx_fcollectmem_work_group, elements are bytes.
group – The SYCL group or sub_group on which to collectively perform the barrier operation.
team – A valid ishmem team handle to a team.

Returns:

Zero on successful local completion. Nonzero otherwise.

Callable from the device.

Description: The ishmemx_collect_work_group and ishmemx_fcollect_work_group routines have similar semantics and requirements as the ishmem_collect and ishmem_fcollect routines, respectively. In contrast with the ishmem_collect and ishmem_fcollect routines, ishmemx_collect_work_group and ishmemx_fcollect_work_group allow for the device threads within group to cooperate towards the operation. This may be more performant; for example, when ishmem_collect requires all device threads in the kernel to invoke RMA operations. The ishmemx_collect_work_group and ishmemx_fcollect_work_group routines block the calling PE until all PEs in the team have called ishmemx_collect_work_group or ishmemx_fcollect_work_group, respectively. If no team argument is passed to ishmemx_collect_work_group, ishmemx_fcollect_work_group, or ishmemx_fcollectmem_work_group, or ishmemx_fcollectmem_work_group, then all PEs in the world team must participate in the collective. Collect routines that accept a team argument operate over all PEs in the provided team. All PEs in the provided team must participate in the collective. If team compares equal to ISHMEM_TEAM_INVALID or is otherwise invalid, the behavior is undefined. ishmemx_collect_work_group and ishmemx_fcollect_work_group only ensure completion and visibility of previously issued memory stores and does not ensure completion of remote memory updates issued via ishmem routines. All threads in group must call the routine with identical arguments.

ISHMEM_REDUCE#

Reduction Types, Names, and Supporting Operations:

TYPE	TYPENAME	Operations Supporting TYPE
char	char	MAX, MIN, SUM, PROD
signed char	schar	MAX, MIN, SUM, PROD
short	short	MAX, MIN, SUM, PROD
int	int	MAX, MIN, SUM, PROD
long	long	MAX, MIN, SUM, PROD
long long	longlong	MAX, MIN, SUM, PROD
ptrdiff_t	ptrdiff	MAX, MIN, SUM, PROD
unsigned char	uchar	AND, OR, XOR, MAX, MIN, SUM, PROD
unsigned short	ushort	AND, OR, XOR, MAX, MIN, SUM, PROD
unsigned int	uint	AND, OR, XOR, MAX, MIN, SUM, PROD
unsigned long	ulong	AND, OR, XOR, MAX, MIN, SUM, PROD
unsigned long long	ulonglong	AND, OR, XOR, MAX, MIN, SUM, PROD
int8_t	int8	AND, OR, XOR, MAX, MIN, SUM, PROD
int16_t	int16	AND, OR, XOR, MAX, MIN, SUM, PROD
int32_t	int32	AND, OR, XOR, MAX, MIN, SUM, PROD
int64_t	int64	AND, OR, XOR, MAX, MIN, SUM, PROD
uint8_t	uint8	AND, OR, XOR, MAX, MIN, SUM, PROD
uint16_t	uint16	AND, OR, XOR, MAX, MIN, SUM, PROD
uint32_t	uint32	AND, OR, XOR, MAX, MIN, SUM, PROD
uint64_t	uint64	AND, OR, XOR, MAX, MIN, SUM, PROD
size_t	size	AND, OR, XOR, MAX, MIN, SUM, PROD
float	float	MAX, MIN, SUM, PROD
double	double	MAX, MIN, SUM, PROD

The following functions perform reduction operations across all PEs in a given ishmem team.

In the functions below, TYPE is one of the reduction types and has a corresponding TYPENAME specified by Table Reduction Types, Names, and Supporting Operations.

template<typename TYPE> int ishmem_and_reduce(TYPE *dest, const TYPE *source, size_t nreduce)#

template<typename TYPE> int ishmem_and_reduce(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce)#

template<typename TYPE> int ishmem_or_reduce(TYPE *dest, const TYPE *source, size_t nreduce)#

template<typename TYPE> int ishmem_or_reduce(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce)#

template<typename TYPE> int ishmem_xor_reduce(TYPE *dest, const TYPE *source, size_t nreduce)#

template<typename TYPE> int ishmem_xor_reduce(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce)#

template<typename TYPE> int ishmem_max_reduce(TYPE *dest, const TYPE *source, size_t nreduce)#

template<typename TYPE> int ishmem_max_reduce(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce)#

template<typename TYPE> int ishmem_min_reduce(TYPE *dest, const TYPE *source, size_t nreduce)#

template<typename TYPE> int ishmem_min_reduce(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce)#

template<typename TYPE> int ishmem_sum_reduce(TYPE *dest, const TYPE *source, size_t nreduce)#

template<typename TYPE> int ishmem_sum_reduce(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce)#

template<typename TYPE> int ishmem_prod_reduce(TYPE *dest, const TYPE *source, size_t nreduce)#

template<typename TYPE> int ishmem_prod_reduce(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce)#

int ishmem_TYPENAME_and_reduce(TYPE *dest, const TYPE *source, size_t nreduce)#

int ishmem_TYPENAME_and_reduce(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce)#

int ishmem_TYPENAME_or_reduce(TYPE *dest, const TYPE *source, size_t nreduce)#

int ishmem_TYPENAME_or_reduce(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce)#

int ishmem_TYPENAME_xor_reduce(TYPE *dest, const TYPE *source, size_t nreduce)#

int ishmem_TYPENAME_xor_reduce(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce)#

int ishmem_TYPENAME_max_reduce(TYPE *dest, const TYPE *source, size_t nreduce)#

int ishmem_TYPENAME_max_reduce(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce)#

int ishmem_TYPENAME_min_reduce(TYPE *dest, const TYPE *source, size_t nreduce)#

int ishmem_TYPENAME_min_reduce(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce)#

int ishmem_TYPENAME_sum_reduce(TYPE *dest, const TYPE *source, size_t nreduce)#

int ishmem_TYPENAME_sum_reduce(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce)#

int ishmem_TYPENAME_prod_reduce(TYPE *dest, const TYPE *source, size_t nreduce)#

int ishmem_TYPENAME_prod_reduce(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce)#

Parameters:

dest – Symmetric address of an array, of length nreduce elements, to receive the result of the reduction routines. The type of dest should match the TYPE and TYPENAME according to the table of Reduction Types.
source – Symmetric address of an array, of length nreduce elements, that contains one element for each separate reduction routine. The type of source should match the TYPE and TYPENAME according to the table of Reduction Types.
nreduce – The number of elements in the dest and source arrays. nreduce must be of type size_t and have the same value across all PEs.
team – A valid ishmem team handle to a team.

Returns:

Zero on successful local completion. Nonzero otherwise.

Callable from the host and device.

Description: ishmem reduction routines are collective routines over all PEs in an existing ishmem team that compute one or more reductions across symmetric arrays. A reduction performs an associative binary routine across a set of values.

The nreduce argument determines the number of separate reductions to perform. The source array on all PEs participating in the reduction provides one element for each reduction. The results of the reductions are placed in the dest array on all PEs participating in the reduction.

The source and dest arguments must either be the same symmetric address, or two different symmetric addresses corresponding to buffers that do not overlap in memory. That is, they must be completely overlapping or completely disjoint.

If no team argument is passed to a reduction routine, all PEs in the world team must participate in the reduction. Reduction routines that accept a team argument operate over all PEs in the provided team. All PEs in the provided team must participate in the collective. If team compares equal to ISHMEM_TEAM_INVALID or is otherwise invalid, the behavior is undefined.

Before any PE calls a reduction routine, the following conditions must be ensured:

The dest array on all PEs participating in the reduction is ready to accept the results of the reduction.

Otherwise, the behavior is undefined.

Upon return from a reduction routine, the following are true for the local PE:

The dest array is updated and the source array may be safely reused.

ISHMEMX_REDUCE_ON_QUEUE#

The following functions perform reduction operations across all PEs in a given ishmem team.

In the functions below, TYPE is one of the reduction types and has a corresponding TYPENAME specified by Table Reduction Types, Names, and Supporting Operations.

template<typename TYPE> sycl::event ishmemx_and_reduce_on_queue(TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

template<typename TYPE> sycl::event ishmemx_and_reduce_on_queue(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

template<typename TYPE> sycl::event ishmemx_or_reduce_on_queue(TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

template<typename TYPE> sycl::event ishmemx_or_reduce_on_queue(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

template<typename TYPE> sycl::event ishmemx_xor_reduce_on_queue(TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

template<typename TYPE> sycl::event ishmemx_xor_reduce_on_queue(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

template<typename TYPE> sycl::event ishmemx_max_reduce_on_queue(TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

template<typename TYPE> sycl::event ishmemx_max_reduce_on_queue(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

template<typename TYPE> sycl::event ishmemx_min_reduce_on_queue(TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

template<typename TYPE> sycl::event ishmemx_min_reduce_on_queue(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

template<typename TYPE> sycl::event ishmemx_sum_reduce_on_queue(TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

template<typename TYPE> sycl::event ishmemx_sum_reduce_on_queue(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

template<typename TYPE> sycl::event ishmemx_prod_reduce_on_queue(TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

template<typename TYPE> sycl::event ishmemx_prod_reduce_on_queue(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_TYPENAME_and_reduce_on_queue(TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_TYPENAME_and_reduce_on_queue(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_TYPENAME_or_reduce_on_queue(TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_TYPENAME_or_reduce_on_queue(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_TYPENAME_xor_reduce_on_queue(TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_TYPENAME_xor_reduce_on_queue(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_TYPENAME_max_reduce_on_queue(TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_TYPENAME_max_reduce_on_queue(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_TYPENAME_min_reduce_on_queue(TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_TYPENAME_min_reduce_on_queue(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_TYPENAME_sum_reduce_on_queue(TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_TYPENAME_sum_reduce_on_queue(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_TYPENAME_prod_reduce_on_queue(TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

sycl::event ishmemx_TYPENAME_prod_reduce_on_queue(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, int *ret, sycl::queue &q, const std::vector<sycl::event> &deps)#

Parameters:

dest – Symmetric address of an array, of length nreduce elements, to receive the result of the reduction routines. The type of dest should match the TYPE and TYPENAME according to the table of Reduction Types.
source – Symmetric address of an array, of length nreduce elements, that contains one element for each separate reduction routine. The type of source should match the TYPE and TYPENAME according to the table of Reduction Types.
nreduce – The number of elements in the dest and source arrays. nreduce must be of type size_t and have the same value across all PEs.
ret – A pointer whose contents will be set to zero on successful local completion; otherwise, nonzero. ret must be accessible from both the host and the device.
q – The SYCL queue on which to execute the operation. q must be mapped to the GPU tile assigned to the calling PE.
deps – An optional vector of SYCL events that the operation depends on.
team – A valid ishmem team handle to a team.

Returns:

The SYCL event created upon submitting the operation to the SYCL runtime.

Callable from the host.

Description: The ishmemx_reduce_on_queue routines have similar semantics and requirements as the ishmem_reduce routines. If no team argument is passed to a reduction routine, all PEs in the world team must participate in the collective. Reduction routines that accept a team argument operate over all PEs in the provided team. All PEs in the provided team must participate in the reduction. If team compares equal to ISHMEM_TEAM_INVALID or is otherwise invalid, the behavior is undefined.

To ensure the contents of dest and ret are valid, refer to the on_queue API Completion Semantics section.

ISHMEMX_REDUCE_WORK_GROUP#

The following functions perform reduction operations across all PEs in a given ishmem team.

In the functions below, TYPE is one of the reduction types and has a corresponding TYPENAME specified by Table Reduction Types, Names, and Supporting Operations.

template<typename TYPE, typename Group> int ishmemx_and_reduce_work_group(TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

template<typename TYPE, typename Group> int ishmemx_and_reduce_work_group(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

template<typename TYPE, typename Group> int ishmemx_or_reduce_work_group(TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

template<typename TYPE, typename Group> int ishmemx_or_reduce_work_group(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

template<typename TYPE, typename Group> int ishmemx_xor_reduce_work_group(TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

template<typename TYPE, typename Group> int ishmemx_xor_reduce_work_group(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

template<typename TYPE, typename Group> int ishmemx_max_reduce_work_group(TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

template<typename TYPE, typename Group> int ishmemx_max_reduce_work_group(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

template<typename TYPE, typename Group> int ishmemx_min_reduce_work_group(TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

template<typename TYPE, typename Group> int ishmemx_min_reduce_work_group(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

template<typename TYPE, typename Group> int ishmemx_sum_reduce_work_group(TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

template<typename TYPE, typename Group> int ishmemx_sum_reduce_work_group(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

template<typename TYPE, typename Group> int ishmemx_prod_reduce_work_group(TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

template<typename TYPE, typename Group> int ishmemx_prod_reduce_work_group(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

template<typename Group> int ishmemx_TYPENAME_and_reduce_work_group(TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

template<typename Group> int ishmemx_TYPENAME_and_reduce_work_group(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

template<typename Group> int ishmemx_TYPENAME_or_reduce_work_group(TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

template<typename Group> int ishmemx_TYPENAME_or_reduce_work_group(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

template<typename Group> int ishmemx_TYPENAME_xor_reduce_work_group(TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

template<typename Group> int ishmemx_TYPENAME_xor_reduce_work_group(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

template<typename Group> int ishmemx_TYPENAME_max_reduce_work_group(TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

template<typename Group> int ishmemx_TYPENAME_max_reduce_work_group(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

template<typename Group> int ishmemx_TYPENAME_min_reduce_work_group(TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

template<typename Group> int ishmemx_TYPENAME_min_reduce_work_group(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

template<typename Group> int ishmemx_TYPENAME_sum_reduce_work_group(TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

template<typename Group> int ishmemx_TYPENAME_sum_reduce_work_group(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

template<typename Group> int ishmemx_TYPENAME_prod_reduce_work_group(TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

template<typename Group> int ishmemx_TYPENAME_prod_reduce_work_group(ishmem_team_t team, TYPE *dest, const TYPE *source, size_t nreduce, const Group &group)#

Parameters:

dest – Symmetric address of an array, of length nreduce elements, to receive the result of the reduction routines. The type of dest should match the TYPE and TYPENAME according to the table of Reduction Types.
source – Symmetric address of an array, of length nreduce elements, that contains one element for each separate reduction routine. The type of source should match the TYPE and TYPENAME according to the table of Reduction Types.
nreduce – The number of elements in the dest and source arrays. nreduce must be of type size_t and have the same value across all PEs.
group – The SYCL group or sub_group on which to collectively perform the barrier operation.
team – A valid ishmem team handle to a team.

Returns:

Zero on successful local completion. Nonzero otherwise.

Callable from the device.

Description: The ishmemx_reduce_work_group routines have similar semantics and requirements as the ishmem_reduce routines. In contrast with the ishmem_reduce routines, ishmemx_reduce_work_group allows for the device threads within group to cooperate towards the reduction operation. This may be more performant; for example, when ishmem_reduce requires all device threads in the kernel to invoke RMA operations. This routine blocks the calling PE until all PEs in the team have called ishmemx_reduce_work_group. If no team argument is passed to a reduction routine, all PEs in the world team must participate in the collective. Reduction routines that accept a team argument operate over all PEs in the provided team. All PEs in the provided team must participate in the reduction. If team compares equal to ISHMEM_TEAM_INVALID or is otherwise invalid, the behavior is undefined. ishmemx_reduce_work_group only ensures completion and visibility of previously issued memory stores and does not ensure completion of remote memory updates issued via ishmem routines. All threads in group must call the routine with identical arguments.

Important

For the reduction operations sum and prod, the order of reduction may not be the same across all participating PEs, so the results for floating point datatypes may differ slightly. This is because floating addition and multiplication are not associative operations.

Collective Operations

Contents

Collective Operations#

ISHMEM_BARRIER_ALL#

ISHMEMX_BARRIER_ALL_ON_QUEUE#

ISHMEMX_BARRIER_ALL_WORK_GROUP#

ISHMEM_SYNC_ALL#

ISHMEMX_SYNC_ALL_ON_QUEUE#

ISHMEMX_SYNC_ALL_WORK_GROUP#

ISHMEM_TEAM_SYNC#

ISHMEMX_TEAM_SYNC_ON_QUEUE#

ISHMEMX_TEAM_SYNC_WORK_GROUP#

ISHMEM_ALLTOALL#

ISHMEMX_ALLTOALL_ON_QUEUE#

ISHMEMX_ALLTOALL_WORK_GROUP#

ISHMEM_BROADCAST#

ISHMEMX_BROADCAST_ON_QUEUE#

ISHMEMX_BROADCAST_WORK_GROUP#

ISHMEM_COLLECT, ISHMEM_FCOLLECT#

ISHMEMX_COLLECT_ON_QUEUE, ISHMEMX_FCOLLECT_ON_QUEUE#

ISHMEMX_COLLECT_WORK_GROUP, ISHMEMX_FCOLLECT_WORK_GROUP#

ISHMEM_REDUCE#

ISHMEMX_REDUCE_ON_QUEUE#

ISHMEMX_REDUCE_WORK_GROUP#