SPIR-V Programming Guide#

Introduction#

SPIR-V is an open, royalty-free, standard intermediate language capable of representing parallel compute kernels. SPIR-V is adaptable to multiple execution environments: a SPIR-V module is consumed by an execution environment, as specified by a client API. This document describes the SPIR-V execution environment for the ‘oneAPI’ Level-Zero API. The SPIR-V execution environment describes required support for some SPIR-V capabilities, additional semantics for some SPIR-V instructions, and additional validation rules that a SPIR-V binary module must adhere to in order to be considered valid.

This document is written for compiler developers who are generating SPIR-V modules intended to be consumed by the ‘oneAPI’ Level-Zero API, for implementors of the ‘oneAPI’ Level-Zero API, and for software developers who are using SPIR-V modules with the ‘oneAPI’ Level-Zero API.

Common Properties#

This section describes common properties of all ‘oneAPI’ Level-Zero environments that consume SPIR-V modules.

A SPIR-V module is interpreted as a series of 32-bit words in host endianness, with literal strings packed as described in the SPIR-V specification. The first few words of the SPIR-V module must be a magic number and a SPIR-V version number, as described in the SPIR-V specification.

Supported SPIR-V Versions#

The maximum SPIR-V version supported by a device is described by ze_device_module_properties_t.spirvVersionSupported.

Extended Instruction Sets#

The OpenCL.std extended instruction set for OpenCL is supported.

Source Language Encoding#

The source language version is purely informational and has no semantic meaning.

Numerical Type Formats#

Floating-point types are represented and stored using IEEE-754 semantics. All integer formats are represented and stored using 2’s-complement format.

Supported Types#

The following types are supported. Note that some types may require additional capabilities, and may not be supported by all environments.

Basic Scalar and Vector Types#

OpTypeVoid is supported.

The following scalar types are supported:

OpTypeBool
OpTypeInt, with Width equal to 8, 16, 32, or 64, and with Signedness equal to zero, indicating no signedness semantics.
OpTypeFloat, with Width equal to 16, 32, or 64.

OpTypeVector vector types are supported. The vector Component Type may be any of the scalar types described above. Supported vector Component Counts are 2, 3, 4, 8, or 16.

OpTypeArray array types are supported, OpTypeStruct struct types are supported, OpTypeFunction functions are supported, and OpTypePointer pointer types are supported.

Image-Related Data Types#

The following table describes the supported OpTypeImage image types:

Dim	Depth	Arrayed	Description
1D	`0`	`0`	A 1D image.
1D	`0`	`1`	A 1D image array.
2D	`0`	`0`	A 2D image.
2D	`1`	`0`	A 2D depth image.
2D	`0`	`1`	A 2D image array.
2D	`1`	`1`	A 2D depth image array.
3D	`0`	`0`	A 3D image.
Buffer	`0`	`0`	A 1D buffer image.

OpTypeSampler sampler typed are supported.

Kernels#

An OpFunction in a SPIR-V module that is identified with OpEntryPoint defines a kernel that may be launched using host API interfaces.

Kernel Return Types#

The Result Type for an OpFunction identified with OpEntryPoint must be OpTypeVoid.

Kernel Arguments#

An OpFunctionParameter for an OpFunction that is identified with OpEntryPoint defines a kernel argument. Allowed types for kernel arguments are:

OpTypeInt
OpTypeFloat
OpTypeStruct
OpTypeVector
OpTypePointer
OpTypeSampler
OpTypeImage

For OpTypeInt parameters, supported Widths are 8, 16, 32, and 64, and must have no signedness semantics.

For OpTypeFloat parameters, supported Widths are 16 and 32.

For OpTypeStruct parameters, supported structure Member Types are:

OpTypeInt
OpTypeFloat
OpTypeStruct
OpTypeVector
OpTypePointer

For OpTypePointer parameters, supported Storage Classes are:

CrossWorkgroup
Workgroup
UniformConstant

Environments that support extensions or optional features may allow additional types in an entry point’s parameter list.

Required Capabilities#

SPIR-V 1.0#

An environment that supports SPIR-V 1.0 must support SPIR-V 1.0 modules that declare the following capabilities:

Addresses
Float16Buffer
Int64
Int16
Int8
Kernel
Linkage
Vector16
GenericPointer
Groups
ImageBasic (for devices supporting ze_device_image_properties_t.supported)
Float16 (for devices supporting ZE_DEVICE_MODULE_FLAG_FP16)
Float64 (for devices supporting ZE_DEVICE_MODULE_FLAG_FP64)
Int64Atomics (for devices supporting ZE_DEVICE_MODULE_FLAG_INT64_ATOMICS)

If the ‘oneAPI’ environment supports the ImageBasic capability, then the following capabilities must also be supported:

LiteralSampler
Sampled1D
Image1D
SampledBuffer
ImageBuffer
ImageReadWrite

SPIR-V 1.1#

An environment supporting SPIR-V 1.1 must support SPIR-V 1.1 modules that declare the capabilities required for SPIR-V 1.0 modules, above.

SPIR-V 1.1 does not add any new required capabilities.

SPIR-V 1.2#

An environment supporting SPIR-V 1.2 must support SPIR-V 1.2 modules that declare the capabilities required for SPIR-V 1.1 modules, above.

SPIR-V 1.2 does not add any new required capabilities.

Validation Rules#

The following are a list of validation rules that apply to SPIR-V modules executing in all ‘oneAPI’ Level-Zero environments:

The Execution Model declared in OpEntryPoint must be Kernel.

The Addressing Model declared in OpMemoryModel must Physical64, indicating that device pointers are 64-bits.

The Memory Model declared in OpMemoryModel must be OpenCL.

For all OpTypeInt integer type-declaration instructions:

Signedness must be 0, indicating no signedness semantics.

For all OpTypeImage type-declaration instructions: * Sampled Type must be OpTypeVoid. * Sampled must be 0, indicating that the image usage will be known at run time, not at compile time. * MS must be 0, indicating single-sampled content. * Arrayed may only be set to 1, indicating arrayed content, when Dim is set to 1D or 2D. * Image Format must be Unknown, indicating that the image does not have a specified format. * The optional image Access Qualifier must be present.

The image write instruction OpImageWrite must not include any optional Image Operands.

The image read instructions OpImageRead and OpImageSampleExplicitLod must not include the optional Image Operand ConstOffset.

For all Atomic Instructions:

32-bit integer types are supported for the Result Type and/or type of Value. 64-bit integer types are optionally supported for the Result Type and/or type of Value for devices supporting ZE_DEVICE_MODULE_FLAG_INT64_ATOMICS.
The Pointer operand must be a pointer to the Function, Workgroup, CrossWorkGroup, or Generic Storage Classes.

Recursion is not supported. The static function call graph for an entry point must not contain cycles.

Whether irreducible control flow is legal is implementation defined.

For the instructions OpGroupAsyncCopy and OpGroupWaitEvents, Scope for Execution must be:

Workgroup

For all other instructions, Scope for Execution must be one of:

Workgroup
Subgroup

Scope for Memory must be one of:

CrossDevice
Device
Workgroup
Invocation
Subgroup

Extensions#

Intel Subgroups#

‘oneAPI’ Level-Zero API environments must accept SPIR-V modules that declare use of the SPV_INTEL_subgroups extension via OpExtension.

When use of the SPV_INTEL_subgroups extension is declared in the module via OpExtension, the environment must accept modules that declare the following SPIR-V capabilities:

SubgroupShuffleINTEL
SubgroupBufferBlockIOINTEL
SubgroupImageBlockIOINTEL

The environment must accept the following types for Data for the SubgroupShuffleINTEL instructions:

Scalars and OpTypeVectors with 2, 4, 8, or 16 Component Count components of the following Component Type types:
- OpTypeFloat with a Width of 32 bits (float)
- OpTypeInt with a Width of 8 bits and Signedness of 0 (char and uchar)
- OpTypeInt with a Width of 16 bits and Signedness of 0 (short and ushort)
- OpTypeInt with a Width of 32 bits and Signedness of 0 (int and uint)
Scalars of OpTypeInt with a Width of 64 bits and Signedness of 0 (long and ulong)

Additionally, if the Float16 capability is declared and supported:

Scalars of OpTypeFloat with a Width of 16 bits (half)

Additionally, if the Float64 capability is declared and supported:

Scalars of OpTypeFloat with a Width of 64 bits (double)

The environment must accept the following types for Result and Data for the SubgroupBufferBlockIOINTEL and SubgroupImageBlockIOINTEL instructions:

Scalars and OpTypeVectors with 2, 4, or 8 Component Count components of the following Component Type types:
- OpTypeInt with a Width of 32 bits and Signedness of 0 (int and uint)
- OpTypeInt with a Width of 16 bits and Signedness of 0 (short and ushort)

For Ptr, valid Storage Classes are:

CrossWorkGroup (global)

For Image:

Dim must be 2D
Depth must be 0 (not a depth image)
Arrayed must be 0 (non-arrayed content)
MS must be 0 (single-sampled content)

For Coordinate, the following types are supported:

OpTypeVectors with two Component Count components of Component Type OpTypeInt with a Width of 32 bits and Signedness of 0 (int2)

Notes and Restrictions#

The SubgroupShuffleINTEL instructions may be placed within non-uniform control flow and hence do not have to be encountered by all invocations in the subgroup, however Data may only be shuffled among invocations encountering the SubgroupShuffleINTEL instruction. Shuffling Data from an invocation that does not encounter the SubgroupShuffleINTEL instruction will produce undefined results.

There is no defined behavior for out-of-range shuffle indices for the SubgroupShuffleINTEL instructions.

The SubgroupBufferBlockIOINTEL and SubgroupImageBlockIOINTEL instructions are only guaranteed to work correctly if placed strictly within uniform control flow within the subgroup. This ensures that if any invocation executes it, all invocations will execute it. If placed elsewhere, behavior is undefined.

There is no defined out-of-range behavior for the SubgroupBufferBlockIOINTEL instructions.

The SubgroupImageBlockIOINTEL instructions do support bounds checking, however they bounds-check to the image width in units of uints, not in units of image elements. This means:

If the image has an Image Format size equal to the size of a uint (four bytes, for example Rgba8), the image will be correctly bounds-checked. In this case, out-of-bounds reads will return the edge image element (the equivalent of ClampToEdge), and out-of-bounds writes will be ignored.
If the image has an Image Format size less than the size of a uint (such as R8), the entire image is addressable, however bounds checking will occur too late. For this reason, extra care should be taken to avoid out-of-bounds reads and writes, since out-of-bounds reads may return invalid data and out-of-bounds writes may corrupt other images or buffers unpredictably.

The following restrictions apply to the SubgroupBufferBlockIOINTEL instructions:

The pointer Ptr must be 32-bit (4-byte) aligned for reads, and must be 128-bit (16-byte) aligned for writes.

The following restrictions apply to the SubgroupImageBlockIOINTEL instructions:

The behavior of the SubgroupImageBlockIOINTEL instructions is undefined for images with an element size greater than four bytes (such as Rgba32f).

The following restrictions apply to the OpSubgroupImageBlockWriteINTEL instruction:

Unlike the image block read instruction, which may read from any arbitrary byte offset, the x-component of the byte coordinate for the image block write instruction must be a multiple of four; in other words, the write must begin at a 32-bit boundary. There is no restriction on the y-component of the coordinate.

Floating-Point Atomics#

‘oneAPI’ Level-Zero API environments supporting the extension ZE_extension_float_atomics must support additional atomic instructions, capabilities, and types.

Atomic Load, Store, and Exchange#

If the ‘oneAPI’ Level-Zero API environment supports the extension ZE_extension_float_atomics and ze_device_fp_atomic_ext_flags_t.fp16Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_LOAD_STORE or ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_LOAD_STORE, then for the Atomic Instructions OpAtomicLoad, OpAtomicStore, and OpAtomicExchange:

16-bit floating-point types are supported for the Result Type and type of Value.
When ze_device_fp_atomic_ext_flags_t.fp16Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_LOAD_STORE, the Pointer operand may be a pointer to the CrossWorkGroup Storage Class.
When ze_device_fp_atomic_ext_flags_t.fp16Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_LOAD_STORE, the Pointer operand may be a pointer to the Workgroup Storage Class.

Atomic Add and Subtract#

If the ‘oneAPI’ Level-Zero API environment supports the extension ZE_extension_float_atomics and ze_device_fp_atomic_ext_flags_t.fp16Flags, ze_device_fp_atomic_ext_flags_t.fp32Flags, or ze_device_fp_atomic_ext_flags_t.fp64Flags include ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_ADD or ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_ADD, then the environment must accept modules that declare use of the extensions SPV_EXT_shader_atomic_float_add and SPV_EXT_shader_atomic_float16_add. Additionally:

When ze_device_fp_atomic_ext_flags_t.fp16Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_ADD or ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_ADD, the AtomicFloat16AddEXT capability must be supported.
When ze_device_fp_atomic_ext_flags_t.fp32Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_ADD or ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_ADD, the AtomicFloat32AddEXT capability must be supported.
When ze_device_fp_atomic_ext_flags_t.fp64Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_ADD or ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_ADD, the AtomicFloat64AddEXT capability must be supported.
For the Atomic Instruction OpAtomicFAddEXT added by these extensions:

Atomic Min and Max#

If the ‘oneAPI’ Level-Zero API environment supports the extension ZE_extension_float_atomics and the ze_device_fp_atomic_ext_flags_t.fp32Flags, ze_device_fp_atomic_ext_flags_t.fp64Flags, or ze_device_fp_atomic_ext_flags_t.fp16Flags bitfields include ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_MIN_MAX or ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_MIN_MAX, then the environment must accept modules that declare use of the extension SPV_EXT_shader_atomic_float_min_max. Additionally:

When ze_device_fp_atomic_ext_flags_t.fp32Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_MIN_MAX or ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_MIN_MAX, the AtomicFloat32MinMaxEXT capability must be supported.
When ze_device_fp_atomic_ext_flags_t.fp64Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_MIN_MAX or ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_MIN_MAX, the AtomicFloat64MinMaxEXT capability must be supported.
When ze_device_fp_atomic_ext_flags_t.fp16Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_MIN_MAX or ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_MIN_MAX, the AtomicFloat16MinMaxEXT capability must be supported.
For the Atomic Instructions OpAtomicFMinEXT and OpAtomicFMaxEXT added by this extension:

Extended Subgroups#

‘oneAPI’ Level-Zero API environments supporting the extension ZE_extension_subgroups must support additional subgroup instructions, capabilities, and types.

Extended Types#

The following Groups instructions must be supported with Scope for Execution equal to Subgroup:

OpGroupBroadcast
OpGroupIAdd, OpGroupFAdd
OpGroupSMin, OpGroupUMin, OpGroupFMin
OpGroupSMax, OpGroupUMax, OpGroupFMax

For these instructions, valid types for Value are:

Scalars of supported types:

Additionally, for OpGroupBroadcast, valid types for Value are:

OpTypeVectors with 2, 3, 4, 8, or 16 Component Count components of supported types:

Vote#

The following capabilities must be supported:

GroupNonUniform
GroupNonUniformVote

For instructions requiring these capabilities, Scope for Execution may be:

Subgroup

For the instruction OpGroupNonUniformAllEqual, valid types for Value are:

Scalars of supported types:

Ballot#

The following capabilities must be supported:

GroupNonUniformBallot

For instructions requiring these capabilities, Scope for Execution may be:

Subgroup

For the non-uniform broadcast instruction OpGroupNonUniformBroadcast, valid types for Value are:

Scalars of supported types:

OpTypeVectors with 2, 3, 4, 8, or 16 Component Count components of supported types:

For the instruction OpGroupNonUniformBroadcastFirst, valid types for Value are:

Scalars of supported types:

For the instruction OpGroupNonUniformBallot, the valid Result Type is an OpTypeVector with four Component Count components of OpTypeInt, with Width equal to 32 and Signedness equal to 0 (equivalent to uint4).

For the instructions OpGroupNonUniformInverseBallot, OpGroupNonUniformBallotBitExtract, OpGroupNonUniformBallotBitCount, OpGroupNonUniformBallotFindLSB, and OpGroupNonUniformBallotFindMSB, the valid type for Value is an OpTypeVector with four Component Count components of OpTypeInt, with Width equal to 32 and Signedness equal to 0 (equivalent to uint4).

For built-in variables decorated with SubgroupEqMask, SubgroupGeMask, SubgroupGtMask, SubgroupLeMask, or SubgroupLtMask, the supported variable type is an OpTypeVector with four Component Count components of OpTypeInt, with Width equal to 32 and Signedness equal to 0 (equivalent to uint4).

Non-Uniform Arithmetic#

The following capabilities must be supported:

GroupNonUniformArithmetic

For instructions requiring these capabilities, Scope for Execution may be:

Subgroup

For the instructions OpGroupNonUniformLogicalAnd, OpGroupNonUniformLogicalOr, and OpGroupNonUniformLogicalXor, the valid type for Value is OpTypeBool.

Otherwise, for the GroupNonUniformArithmetic scan and reduction instructions, valid types for Value are:

Scalars of supported types:

For the GroupNonUniformArithmetic scan and reduction instructions, the optional ClusterSize operand must not be present.

Shuffles#

The following capabilities must be supported:

GroupNonUniformShuffle

For instructions requiring these capabilities, Scope for Execution may be:

Subgroup

For the instructions OpGroupNonUniformShuffle and OpGroupNonUniformShuffleXor requiring these capabilities, valid types for Value are:

Scalars of supported types:

Relative Shuffles#

The following capabilities must be supported:

GroupNonUniformShuffleRelative

For instructions requiring these capabilities, Scope for Execution may be:

Subgroup

For the GroupNonUniformShuffleRelative instructions, valid types for Value are:

Scalars of supported types:

Clustered Reductions#

The following capabilities must be supported:

GroupNonUniformClustered

For instructions requiring these capabilities, Scope for Execution may be:

Subgroup

When the GroupNonUniformClustered capability is declared, the GroupNonUniformArithmetic scan and reduction instructions may include the optional ClusterSize operand.

Linkonce ODR#

‘oneAPI’ Level-Zero API environments supporting the extension ZE_extension_linkonce_odr must must accept SPIR-V modules that declare use of the SPV_KHR_linkonce_odr extension via OpExtension.

When use of the SPV_KHR_linkonce_odr extension is declared in the module via OpExtension, the environment must accept modules that include the LinkOnceODR linkage type.

Bfloat16 Conversions#

‘oneAPI’ Level-Zero API environments supporting the extension ZE_extension_bfloat16_conversions must must accept SPIR-V modules that declare use of the SPV_INTEL_bloat16_conversion extension via OpExtension.

When use of the SPV_INTEL_bloat16_conversion extension is declared in the module via OpExtension, the environment must accept modules that declare the Bfloat16ConversionINTEL capability.

For the instructions OpConvertFToBF16INTEL and OpConvertBF16ToFINTEL added by the extension:

Valid types for Result Type, Float Value, and Bfloat16 Value are Scalars and OpTypeVectors with 2, 3, 4, 8, or 16 Component Count components

Global Variables#

‘oneAPI’ Level-Zero API environments must accept SPIR-V modules that declare use of the SPV_INTEL_global_variable_host_access extension via OpExtension.

When use of the SPV_INTEL_global_variable_host_access extension is declared in the module via OpExtension, the environment must accept modules that declare the GlobalVariableHostAccessINTEL SPIR-V capability:

The function zeModuleGetGlobalPointer can be used to retrieve a pointer to a global variable.

zeModuleGetGlobalPointer takes a pGlobalName parameter which identifies the variable. For a ze_module_handle_t created from SPIR-V this parameter is interpreted as follows:

The implementation first looks for an OpVariable that is decorated with HostAccessINTEL where the Name operand is the same as pGlobalName
If no such variable is found the implementation then looks for an OpVariable that is decorated with LinkageAttributes where the Name operand is the same as pGlobalName. (The implementation considers both exported and imported variables as candidates)

If the module was created from native code that came form a previous call to zeModuleGetNativeBinary and that other module was created from SPIR-V, then the interpretation of pGlobalName is the same as the SPIR-V case.

If pGlobalName identifies an imported SPIR-V variable, the module must be dynamically linked before the variable’s pointer may be queried.

Numerical Compliance#

The ‘oneAPI’ Level-Zero environment will meet or exceed the numerical compliance requirements defined in the OpenCL SPIR-V Environment Specification. See: Numerical Compliance.

Image Addressing and Filtering#

The ‘oneAPI’ Level-Zero environment image addressing and filtering behavior is compatible with the behavior defined in the OpenCL SPIR-V Environment Specification. See: Image Addressing and Filtering.

SPIR-V Programming Guide

Contents

SPIR-V Programming Guide#

Introduction#

Common Properties#

Supported SPIR-V Versions#

Extended Instruction Sets#

Source Language Encoding#

Numerical Type Formats#

Supported Types#

Basic Scalar and Vector Types#

Image-Related Data Types#

Kernels#

Kernel Return Types#

Kernel Arguments#

Required Capabilities#

SPIR-V 1.0#

SPIR-V 1.1#

SPIR-V 1.2#

Validation Rules#

Extensions#

Intel Subgroups#

Notes and Restrictions#

Floating-Point Atomics#

Atomic Load, Store, and Exchange#

Atomic Add and Subtract#

Atomic Min and Max#

Extended Subgroups#

Extended Types#

Vote#

Ballot#

Non-Uniform Arithmetic#

Shuffles#

Relative Shuffles#

Clustered Reductions#

Linkonce ODR#

Bfloat16 Conversions#

Global Variables#

Numerical Compliance#

Image Addressing and Filtering#