Command-Buffer

Motivation#

A command-buffer represents a series of commands for execution on a command queue. Many adapters support this kind of construct either natively or through extensions, but they are not available to use directly. Typically their use is abstracted through the existing Core APIs, for example when calling urEnqueueKernelLaunch the adapter may both append the kernel command to a command-buffer-like construct and also submit that command-buffer to a queue for execution. These types of structures allow for batching of commands to improve host launch latency, but without direct control it falls to the adapter implementation to implement automatic batching of commands.

This experimental feature exposes command-buffers in the Unified Runtime API directly, allowing applications explicit control over the enqueue and execution of commands to batch commands as required for optimal performance.

Querying Command-Buffer Support#

Support for command-buffers can be queried for a given device/adapter by using the device info query with UR_DEVICE_INFO_COMMAND_BUFFER_SUPPORT_EXP. Adapters supporting this experimental feature will report true.

ur_bool_t CmdBufferSupport = false;
urDeviceGetInfo(hDevice, UR_DEVICE_INFO_COMMAND_BUFFER_SUPPORT_EXP,sizeof(CmdBufferSupport), &CmdBufferSupport, nullptr);

Command-Buffer Creation#

Command-Buffers are tied to a specific ur_context_handle_t and ur_device_handle_t. urCommandBufferCreateExp takes a descriptor to provide additional properties for how the command-buffer should be constructed. The members defined in ur_exp_command_buffer_desc_t are:

isUpdatable, which should be set to true to support updating command-buffer commands.
isInOrder, which should be set to true to enforce commands appended to a command-buffer to be executed in an in-order fashion.
enableProfiling, which should be set to true to enable profiling of the command-buffer.

Command-buffers are reference counted and can be retained and released by calling urCommandBufferRetainExp and urCommandBufferReleaseExp respectively.

Appending Commands#

Commands can be appended to a command-buffer by calling any of the command-buffer append functions. Typically these closely mimic the existing enqueue functions in the Core API in terms of their command-specific parameters. However, they differ in that they take a command-buffer handle instead of a queue handle. Dependencies are also expressed differently, in that internal command-buffer dependencies are expressed with sync-points. While event handles are used to express synchronization external to the command-buffer.

The entry-points for appending commands also return an optional handle to the command being appended. This handle can be used to update the command configuration between command-buffer executions, see the section on updating command-buffer commands.

Currently only the following commands are supported:

It is planned to eventually support any command type from the Core API which can actually be appended to the equivalent adapter native constructs.

Sync-Points#

A sync-point is a value which represents a command inside of a command-buffer which is returned from command-buffer append function calls. These can be optionally passed to these functions to define execution dependencies on other commands within the command-buffer. Both wait-list and return sync-point parameters to append functions are ignored if the command-buffer was created with the in-order property.

Sync-points are unique and valid for use only within the command-buffer they were obtained from.

// Append a memcpy with no sync-point dependencies
ur_exp_command_buffer_sync_point_t syncPoint;

urCommandBufferAppendUSMMemcpyExp(hCommandBuffer, pDst, pSrc, size, 0,nullptr, 0, nullptr, &syncPoint, nullptr,nullptr);

// Append a kernel launch with syncPoint as a dependency, ignore returned
// sync-point
urCommandBufferAppendKernelLaunchExp(hCommandBuffer, hKernel, workDim,pGlobalWorkOffset, pGlobalWorkSize,pLocalWorkSize, 0, nullptr, 1,&syncPoint, 0, nullptr,nullptr, nullptr, nullptr);

Command Synchronization With Events#

When appending commands to a command-buffer an optional phEventWaitList input parameter is available for passing a list of ur_event_handle_t objects the command should wait on. As well as an optional phEvent output parameter to get a ur_event_handle_t object that will be signaled on completion of the command execution. It is the users responsibility to release the returned phEvent with urEventRelease.

The wait event parameter allows commands in a command-buffer to depend on the completion of UR commands which are external to a command-buffer. While the output signal event parameter allows individual commands in a command-buffer to trigger external queue commands. Using returned signal events as wait events inside the same command-buffer is also valid usage.

Important

Support for using phEventWaitList & phEvent parameters requires a device to support UR_DEVICE_INFO_COMMAND_BUFFER_EVENT_SUPPORT_EXP.

Signal Event Valid Usage#

A returned signal event represents only the status of the command in the current execution of a given command-buffer on a device. Command signal events are not unique per execution of a command-buffer. If a command-buffer is enqueued multiple times before using one of these events (for example as a dependency to an eager queue operation), it is undefined which specific execution of the command-buffer the event will represent. If a dependency on a specific graph command-buffer execution is required this ordering must be enforced by the user to ensure there is only a single command-buffer execution in flight when using these command signal events.

When a user calls urEnqueueCommandBufferExp all the signal events returned from the individual commands in the command-buffer are synchronously reset to a non-complete state prior to the asynchronous commands beginning.

Inter-Graph Synchronization#

It is possible for commands in different command-buffer objects to synchronize using the event mechanism. This is only guaranteed to behave correctly in the one directional synchronization case, where the signal events of one command-buffer’s commands are used as a wait events of another command-buffer’s commands. Such a relationship defines a permanent dependency between the command-buffers which does not need to be updated using command event update to preserve synchronization on future enqueues of the command-buffer.

Bi-directional sync between individual commands in two separate command-buffers is however not guaranteed to behave correctly. This is due to the completion state of the command events only being reset when a command-buffer is enqueued. It is therefore possible for the first command-buffer enqueued to execute its wait node that needs to have its event reset by the enqueue of the second command-buffer, before the code path returns to user code for the user to enqueue the second command-buffer. Resulting in the first command-buffer’s wait node completing too early for the intended overall executing ordering.

Native Commands#

The command-buffer interface enables user interop with native backend APIs. Through urCommandBufferAppendNativeCommandExp the user can immediately invoke some native API calls that add commands to the command-buffer in a way that the UR is aware of. In doing so, the UR adapter can respect the dependencies of the native commands with the other UR command-buffer commands.

In order for UR to guarantee correct synchronization of commands enqueued within the native API through the function passed to urCommandBufferAppendNativeCommandExp, the ur_exp_command_buffer_handle_t arguments must only use the native command-buffer accessed through urCommandBufferGetNativeHandleExp. Use of a native command-buffer that is not a native command-buffer returned by urCommandBufferGetNativeHandleExp results in undefined behavior.

The ur_exp_command_buffer_handle_t hChildCommandBuffer parameter to urCommandBufferAppendNativeCommandExp is used by the CUDA & HIP adapters to implement this feature, but is ignored by Level-Zero and OpenCL. This represents a child graph node that will be added to the parent graph, with the child graph node expressing the sync-point dependencies and returned sync point. This child graph object will be packed into the void* pData argument that will be given to the user in the pfnNativeCommand callback for adding the native nodes to the command-buffer.

Level-Zero & OpenCL backends use barrier nodes to enforce the dependencies on the user added nodes, rather than using an append child graph API. As a result the native command-buffer object for hCommandBuffer should be packed into void* pData, as the adapters will ignore the hChildCommandBuffer parameter.

Enqueueing Command-Buffers#

Command-buffers are submitted for execution on a ur_queue_handle_t with an optional list of dependent events. An event can be returned which tracks the execution of the command-buffer, and will be complete when all appended commands have finished executing.

ur_event_handle_t executionEvent;
urEnqueueCommandBufferExp(hQueue, hCommandBuffer, 0, nullptr,&executionEvent);

A command-buffer can be submitted for execution while a previous submission of the same command-buffer is still awaiting completion. That is, the user is not required to do a blocking wait on the completion of the first command-buffer submission before making a second submission of the command-buffer.

Each submissions of a command-buffer is ordered behind previous submissions of the same command-buffer. As well as respecting the other synchronization dependencies set by the user, such as events, barriers, or in-order queue dependencies.

// Submission of hCommandBuffer to hQueueB as an implicit dependency on
// prior submission to hQueueA.
urEnqueueCommandBufferExp(hQueueA, hCommandBuffer, 0, nullptr,nullptr);
urEnqueueCommandBufferExp(hQueueB, hCommandBuffer, 0, nullptr,nullptr);

Updating Command-Buffer Commands#

An adapter implementing the command-buffer experimental feature can optionally support updating the configuration of kernel commands recorded to a command-buffer. The attributes of kernel commands that can be updated are device specific and can be queried using the UR_DEVICE_INFO_COMMAND_BUFFER_UPDATE_CAPABILITIES_EXP query.

All update entry-points are synchronous and may block if the command-buffer is executing when the entry-point is called.

Kernel Argument Update#

Kernel commands can have the ND-Range & parameter arguments of the command updated when a device supports the relevant bits in UR_DEVICE_INFO_COMMAND_BUFFER_UPDATE_CAPABILITIES_EXP.

Updating kernel commands is done by passing the new kernel configuration to urCommandBufferUpdateKernelLaunchExp along with the command handle of the kernel command to update. Configurations that can be changed are the kernel handle, the parameters to the kernel and the execution ND-Range.

Kernel handles that might be used to update the kernel of a command, need to be registered when the command is created. This can be done using the phKernelAlternatives parameter of urCommandBufferAppendKernelLaunchExp. The command can then be updated to use the new kernel handle by passing it to urCommandBufferUpdateKernelLaunchExp.

Important

When updating the kernel handle of a command all required arguments to the new kernel must be provided in the update descriptor. Failure to do so will result in undefined behavior.

// Create a command-buffer with update enabled.
ur_exp_command_buffer_desc_t desc {
  UR_STRUCTURE_TYPE_EXP_COMMAND_BUFFER_DESC,
  nullptr,
  true // isUpdatable
};
ur_exp_command_buffer_handle_t hCommandBuffer;
urCommandBufferCreateExp(hContext, hDevice, &desc, &hCommandBuffer);

// Append a kernel command which has two buffer parameters, an input
// and an output. Register hNewKernel as an alternative kernel handle
// which can later be used to change the kernel handle associated
// with this command.
ur_exp_command_buffer_command_handle_t hCommand;
urCommandBufferAppendKernelLaunchExp(hCommandBuffer, hKernel, workDim,pGlobalWorkOffset, pGlobalWorkSize,pLocalWorkSize, 1, &hNewKernel,0, nullptr, 0, nullptr, nullptr,nullptr, &hCommand);

// Close the command-buffer before updating
urCommandBufferFinalizeExp(hCommandBuffer);

// Define kernel argument at index 0 to be a new input buffer object
ur_exp_command_buffer_update_memobj_arg_desc_t newInputArg {
    UR_STRUCTURE_TYPE_EXP_COMMAND_BUFFER_UPDATE_MEMOBJ_ARG_DESC, // stype
    nullptr, // pNext
    0, // argIndex
    nullptr, // pProperties
    newInputBuffer, // hNewMemObjArg
};

// Define kernel argument at index 1 to be a new output buffer object
ur_exp_command_buffer_update_memobj_arg_desc_t newOutputArg {
    UR_STRUCTURE_TYPE_EXP_COMMAND_BUFFER_UPDATE_MEMOBJ_ARG_DESC, // stype
    nullptr, // pNext
    1, // argIndex
    nullptr, // pProperties
    newOutputBuffer, // hNewMemObjArg
};

// Define the new configuration of the kernel command
ur_exp_command_buffer_update_memobj_arg_desc_t updatedArgs[2] = {newInputArg, newOutputArg};
ur_exp_command_buffer_update_kernel_launch_desc_t update {
    UR_STRUCTURE_TYPE_EXP_COMMAND_BUFFER_UPDATE_KERNEL_LAUNCH_DESC, // stype
    nullptr, // pNext
    hCommand, // hCommand
    hNewKernel,  // hNewKernel
    2, // numNewMemobjArgs
    0, // numNewPointerArgs
    0, // numNewValueArgs
    0, // numNewExecInfos
    0, // newWorkDim
    new_args, // pNewMemObjArgList
    nullptr, // pNewPointerArgList
    nullptr, // pNewValueArgList
    nullptr, // pNewExecInfoList
    nullptr, // pNewGlobalWorkOffset
    nullptr, // pNewGlobalWorkSize
    nullptr, // pNewLocalWorkSize
};

// Perform the update
urCommandBufferUpdateKernelLaunchExp(hCommandBuffer, 1, &update);

Command Event Update#

Once a command-buffer has been finalized the wait-list parameter of the command can be updated with urCommandBufferUpdateWaitEventsExp. The number of wait events for a command must stay consistent, therefore the number of events passed to urCommandBufferUpdateWaitEventsExp must be the same as when the command was created.

The urCommandBufferUpdateSignalEventExp entry-points can be used to update the signal event of a command. This returns a new event that will be signaled on the next execution of the command in the command-buffer. It may be that this is backed by the same native event object as the original signal event, provided that the backend provides a way to reset or reuse events between command-buffer executions.

As ur_event_handle_t objects for queue submissions can only be signaled once, and not reset, this update mechanism allows command synchronization to be refreshed between command-buffer executions with regular command-queue events that haven’t yet been signaled.

It is the users responsibility to release the returned phEvent with urEventRelease. To update a command signal event with urCommandBufferUpdateSignalEventExp there must also have been a non-null phEvent parameter passed on command creation.

Important

Support for updating phEventWaitList & phEvent parameters requires a device to support the EVENTS bit in UR_DEVICE_INFO_COMMAND_BUFFER_UPDATE_CAPABILITIES_EXP.

// Create a command-buffer with update enabled.
ur_exp_command_buffer_desc_t desc {
  UR_STRUCTURE_TYPE_EXP_COMMAND_BUFFER_DESC,
  nullptr,
  true // isUpdatable
};
ur_exp_command_buffer_handle_t hCommandBuffer;
urCommandBufferCreateExp(hContext, hDevice, &desc, &hCommandBuffer);

// Append a kernel command with 2 events to wait on, and returning an
// event that will be signaled.
ur_event_handle_t hSignalEvent;
ur_event_handle_t hWaitEvents[2] = {...};
ur_exp_command_buffer_command_handle_t hCommand;
urCommandBufferAppendKernelLaunchExp(hCommandBuffer, hKernel, workDim,pGlobalWorkOffset, pGlobalWorkSize,pLocalWorkSize, 0, nullptr, 0, nullptr,2, hWaitEvents, nullptr, &hSignalEvent,&hCommand);

// Close the command-buffer before updating
urCommandBufferFinalizeExp(hCommandBuffer);

// Enqueue command-buffer
urEnqueueCommandBufferExp(hQueue, hCommandBuffer, 0, nullptr, nullptr);

// Wait for command-buffer to finish
urQueueFinish(hQueue);

// Update signal event
ur_event_handle_t hNewSignalEvent;
urCommandBufferUpdateSignalEventExp(hCommand, &hNewSignalEvent);

// Update wait events to a new event
ur_event_handle_t hNewWaitEvents[2] = ...;
{x}CommandBufferUpdateWaitEventsExp(hCommand, 2, &hNewWaitEvents);

API#

ur_exp_command_buffer_desc_t
ur_exp_command_buffer_update_kernel_launch_desc_t
ur_exp_command_buffer_update_memobj_arg_desc_t
ur_exp_command_buffer_update_pointer_arg_desc_t
ur_exp_command_buffer_update_value_arg_desc_t
ur_exp_command_buffer_sync_point_t
ur_exp_command_buffer_handle_t
ur_exp_command_buffer_command_handle_t

Changelog#

Revision	Changes
1.0	Initial Draft
1.1	Add function definitions for buffer read and write
1.2	Add function definitions for fill commands
1.3	Add function definitions for Prefetch and Advise commands
1.4	Add function definitions for kernel command update
1.5	Add support for updating kernel handles.
1.6	Command level synchronization with event objects
1.7	Remove command handle reference counting and querying
1.8	Change Kernel command update API to take a list
1.9	Rename enqueue API to urEnqueueCommandBufferExp
1.10	Remove extension string macro, make device info enum primary mechanism for reporting support.
1.11	Support native commands.
1.12	Strengthen in-order property such that sync-points parameters to append APIs are ignored.

Contributors#

Ben Tracy ben.tracy@codeplay.com
Ewan Crawford ewan@codeplay.com
Maxime France-Pillois maxime.francepillois@codeplay.com
Aaron Greig aaron.greig@codeplay.com
Fábio Mestre fabio.mestre@codeplay.com
Konrad Kusiak konrad.kusiak@codeplay.com

Unified Runtime Specification documentation

Contents

Command-Buffer#

Motivation#

Querying Command-Buffer Support#

Command-Buffer Creation#

Appending Commands#

Sync-Points#

Command Synchronization With Events#

Signal Event Valid Usage#

Inter-Graph Synchronization#

Native Commands#

Enqueueing Command-Buffers#

Updating Command-Buffer Commands#

Kernel Argument Update#

Command Event Update#

API#

Changelog#

Contributors#