Ray Tracing Acceleration Structure Extension#
API#
Enumerations
Structures
Functions
Ray Tracing Acceleration Structure#
The Ray Tracing Acceleration Structure extension provides the functionality to build ray tracing acceleration structures (RTAS) for 3D scenes on the host for use with GPU devices.
It is the user’s responsibility to manage the acceleration structure buffer and scratch buffer resources. The required sizes may be queried via zeRTASBuilderGetBuildPropertiesExt. Once built, an acceleration structure is a self-contained entity; any input resources may be released after the successful construction.
Scene Data#
To build an acceleration structure, first setup a scene that consists of one or more geometry infos.
ze_rtas_builder_triangles_geometry_info_ext_t for triangle meshes,
ze_rtas_builder_quads_geometry_info_ext_t for quad meshes,
ze_rtas_builder_procedural_geometry_info_ext_t for procedural primitives with attached axis-aligned bounding-box, and
ze_rtas_builder_instance_geometry_info_ext_t for instances of other acceleration structures.
The following example creates a ze_rtas_builder_triangles_geometry_info_ext_t to specify a triangle mesh:
std::vector<ze_rtas_triangle_indices_uint32_ext_t> triangleIndexBuffer; std::vector<ze_rtas_float3_ext_t> triangleVertexBuffer; // Populate vertex and index buffers { // ... } ze_rtas_builder_triangles_geometry_info_ext_t mesh; memset(&mesh, 0, sizeof(mesh)); mesh.geometryType = ZE_RTAS_BUILDER_GEOMETRY_TYPE_EXT_TRIANGLES; mesh.geometryFlags = 0; mesh.geometryMask = 0xFF; mesh.triangleFormat = ZE_RTAS_BUILDER_INPUT_DATA_FORMAT_EXT_TRIANGLE_INDICES_UINT32; mesh.triangleCount = triangleIndexBuffer.size(); mesh.triangleStride = sizeof(ze_rtas_triangle_indices_uint32_ext_t); mesh.pTriangleBuffer = triangleIndexBuffer.data(); mesh.vertexFormat = ZE_RTAS_BUILDER_INPUT_DATA_FORMAT_EXT_FLOAT3; mesh.vertexCount = triangleVertexBuffer.size(); mesh.vertexStride = sizeof(ze_rtas_float3_ext_t); mesh.pVertexBuffer = triangleVertexBuffer.data();
Geometry is considered to be opaque by default, enabling a fast mode where traversal does not return to the caller of ray tracing for each triangle or quad hit. To process each triangle or quad hit by some any-hit shader, the geometryFlags member of the geometry infos must include the ZE_RTAS_BUILDER_GEOMETRY_EXT_FLAG_NON_OPAQUE flag. The proper data formats of the triangle index- and vertex- buffers are specified, including the strides, and a pointer to the first element for each buffer.
To refer to multiple geometries that make a scene, pointers to geometry info structures can be put into an array as follows:
std::vector<ze_rtas_builder_geometry_info_ext_t*> geometries; geometries.push_back((ze_rtas_builder_geometry_info_ext_t*)&mesh0); geometries.push_back((ze_rtas_builder_geometry_info_ext_t*)&mesh1); ...
This completes the definition of the geometry for the scene for which to construct the acceleration structure.
Device Properties#
The next step is to query the target device for acceleration structure properties.
ze_rtas_device_ext_properties_t rtasDeviceProps; rtasDeviceProps.stype = ZE_STRUCTURE_TYPE_RTAS_DEVICE_EXT_PROPERTIES; rtasDeviceProps.pNext = nullptr; ze_device_properties_t deviceProps; deviceProps.stype = ZE_STRUCTURE_TYPE_DEVICE_PROPERTIES; deviceProps.pNext = &rtasDeviceProps; zeDeviceGetProperties(hDevice, &deviceProps);
The device properties contain information (a device-specific ray tracing acceleration structure format) that is required to complete an RTAS build operation.
Acceleration Structure Builder#
With the scene data prepared and relevant device properties known, create a ray tracing acceleration structure builder object and query for the necessary build properties.
ze_rtas_builder_ext_desc_t desc; desc.stype = ZE_STRUCTURE_TYPE_RTAS_BUILDER_EXT_DESC; desc.pNext = nullptr; desc.builderVersion = ZE_RTAS_BUILDER_EXT_VERSION_CURRENT; ze_rtas_builder_ext_handle_t hBuilder = nullptr; ze_result_t result = zeRTASBuilderCreateExt(hDriver, &desc, &hBuilder); assert(result == ZE_RESULT_SUCCESS); ze_rtas_builder_ext_properties_t builderProps; builderProps.stype = ZE_STRUCTURE_TYPE_RTAS_BUILDER_EXT_PROPERTIES; builderProps.pNext = nullptr; ze_rtas_builder_build_op_ext_desc_t buildOpDesc; buildOpDesc.stype = ZE_STRUCTURE_TYPE_RTAS_BUILDER_BUILD_OP_EXT_DESC; buildOpDesc.pNext = nullptr; buildOpDesc.rtasFormat = rtasDeviceProps.rtasFormat; buildOpDesc.buildQuality = ZE_RTAS_BUILDER_BUILD_QUALITY_HINT_EXT_MEDIUM; buildOpDesc.buildFlags = 0; buildOpDesc.ppGeometries = geometries.data(); buildOpDesc.numGeometries = geometries.size(); result = zeRTASBuilderGetBuildPropertiesExt(hBuilder, &buildOpDesc, &builderProps); assert(result == ZE_RESULT_SUCCESS);
Note, the parameters of the build operation descriptor, such as acceleration structure build quality, affect the buffer requirements, etc.
An application may create and use a single RTAS builder object, as multiple concurrent build operations may be performed with a single such object.
Buffers#
With the builder properties along with everything else known at this point, the resources for the acceleration structure may be allocated.
Scratch Buffer#
A system memory scratch buffer is required to perform the build operation. It is used by the implementation for intermediate storage.
void* pScratchBuffer = malloc(builderProps.scratchBufferSizeBytes);
Acceleration Structure Buffer#
The acceleration structure buffer holds the built ray tracing acceleration structure. Typically a host allocation is used to build the acceleration structure to be later copied into a device buffer using the zeRTASBuilderCommandListAppendCopyExt function.
// allocate host memory to build the acceleration structure into void* pRtasBufferHost = aligned_alloc(rtasDeviceProps.rtasBufferAlignment, builderProps.rtasBufferSizeBytesMaxRequired); // create a device allocation to later copy the build acceleration structure into ze_raytracing_mem_alloc_ext_desc_t rtasMemAllocDesc; rtasMemAllocDesc.stype = ZE_STRUCTURE_TYPE_DEVICE_RAYTRACING_EXT_PROPERTIES; rtasMemAllocDesc.pNext = nullptr; rtasMemAllocDesc.flags = 0; ze_device_mem_alloc_desc_t deviceMemAllocDesc; deviceMemAllocDesc.stype = ZE_STRUCTURE_TYPE_DEVICE_MEM_ALLOC_DESC; deviceMemAllocDesc.pNext = &rtasMemAllocDesc; deviceMemAllocDesc.flags = ZE_DEVICE_MEM_ALLOC_FLAG_BIAS_CACHED; deviceMemAllocDesc.ordinal = 0; void* pRtasBufferDevice = nullptr; result = zeMemAllocDevice(hContext, &deviceMemAllocDesc, builderProps.rtasBufferSizeBytesMaxRequired, rtasDeviceProps.rtasBufferAlignment, hDevice, &pRtasBufferDevice); assert(result == ZE_RESULT_SUCCESS);
Executing an Acceleration Structure Build#
Single-Threaded Build#
A single-threaded acceleration structure build on the host is initiated using zeRTASBuilderBuildExt.
result = zeRTASBuilderBuildExt(hBuilder, &buildOpDesc, pScratchBuffer, builderProps.scratchBufferSizeBytes, pRtasBufferHost, builderProps.rtasBufferSizeBytesMaxRequired, nullptr, nullptr, nullptr, nullptr); assert(result == ZE_RESULT_SUCCESS);
When the build completes successfully the acceleration structure buffer is ready for use by the ray tracing API.
Parallel Build#
In order to speed up the build operation using multiple worker threads, a parallel operation object can be associated with the build operation and joined with the application-provided worker threads as in the following example:
Note The following example uses oneTBB to dispatch worker threads, but this is not a requirement.
ze_rtas_parallel_operation_ext_handle_t hParallelOperation = nullptr; result = zeRTASParallelOperationCreateExt(hDriver, &hParallelOperation); assert(result == ZE_RESULT_SUCCESS); // Initiate the acceleration structure build operation with a handle // of a parallel operation object. This causes the parallel operation to be // bound to the build operation and the function returns immediately without // building any acceleration structure yet. result = zeRTASBuilderBuildExt(hBuilder, &buildOpDesc, pScratchBuffer, builderProps.scratchBufferSizeBytes, pRtasBufferHost, builderProps.rtasBufferSizeBytesMaxRequired, hParallelOperation, nullptr, nullptr, nullptr); assert(result == ZE_RESULT_EXT_RTAS_BUILD_DEFERRED); // Once the parallel operation is bound to the build operation the number // of worker threads to join the parallel operation can be queried. ze_rtas_parallel_operation_ext_properties_t parallelOpProps; parallelOpProps.stype = ZE_STRUCTURE_TYPE_RTAS_PARALLEL_OPERATION_EXT_PROPERTIES; parallelOpProps.pNext = nullptr; result = zeRTASParallelOperationGetPropertiesExt(hParallelOperation, ¶llelOpProps); assert(result == ZE_RESULT_SUCCESS); // Now worker threads can join the build operation to perform the actual build // of the acceleration structure. tbb::parallel_for(0, parallelOpProps.maxConcurrency, 1, [&](uint32_t i) { ze_result_t buildResult = zeRTASParallelOperationJoinExt(hParallelOperation); assert(buildResult == ZE_RESULT_SUCCESS); }); // With the parallel operation complete, the parallel operation object can be released. result = zeRTASParallelOperationDestroyExt(hParallelOperation); assert(result == ZE_RESULT_SUCCESS);
Note that the number of worker threads to be used can only be queried from the parallel operation object after it is bound to the build operation by the call to zeRTASBuilderBuildExt.
Acceleration Structure Copy#
Once the acceleration structure got build into the host buffer, one can use the zeRTASBuilderCommandListAppendCopyExt function to copy the acceleration structure to the device. The acceleration structure is generally non-copyable using standard copy operation, thus this special copy function must be used.
zeRTASBuilderCommandListAppendCopyExt(hCommandList, pRtasBufferDevice, pRtasBufferHost, builderProps.rtasBufferSizeBytesMaxRequired, nullptr, 0, nullptr);
As soon as the copy is finished, the acceleration strucuture is ready to be used on the device. Alternatively, one can also use a shared USM allocation to build the acceleration structure into and skip the explicit copy.
Conservative Acceleration Structure Buffer Size#
Sizing the acceleration structure buffer using the rtasBufferSizeBytesMaxRequired member of ze_rtas_builder_ext_properties_t guarantees that the build operation will not fail due to an out-of-memory condition. However, this size represents the memory requirement for the worst-case scenario and is larger than is typically needed. To reduce memory usage, the application may attempt to execute a build using an acceleration structure buffer sized to the rtasBufferSizeBytesExpected member of ze_rtas_builder_ext_properties_t. When using the expected size, however, it is possible for the build operation to fail with ZE_RESULT_EXT_RTAS_BUILD_RETRY. If this occurs, the application may resize the acceleration structure buffer with an updated size estimate provided by the RTAS build.
ze_result_t result; void* pRtasBufferHost = nullptr; size_t rtasBufferSizeBytes = builderProps.rtasBufferSizeBytesExpected; while (true) { pRtasBufferHost = aligned_alloc(rtasDeviceProps.rtasBufferAlignment, rtasBufferSizeBytes); result = zeRTASBuilderBuildExt(hBuilder, &buildOpDesc, pScratchBuffer, builderProps.scratchBufferSizeBytes, pRtasBufferHost, rtasBufferSizeBytes, nullptr, nullptr, nullptr, &rtasBufferSizeBytes); if (result == ZE_RESULT_SUCCESS) { break; } assert(result == ZE_RESULT_EXT_RTAS_BUILD_RETRY); free(pRtasBufferHost); }
The loop starts with the minimum acceleration buffer size for which the build will mostly likely succeed. If the build runs out of memory, ZE_RESULT_EXT_RTAS_BUILD_RETRY is returned and the build is retried with a larger acceleration structure buffer.
The example above passes a pointer to the rtasBufferSizeBytes variable as a parameter to the build API, which it will update with a larger acceleration structure buffer size estimate to be used in the next attempt should the build operation fail. Alternatively, the application could increase the acceleration buffer size for the next attempt by some percentage, which could fail again, or just use the maximum size from the builder properties for the second attempt.
Cleaning Up#
Once the acceleration structure has been built, any resources associated with the build may be released. Additionally, any parallel operation objects should be destroyed as well as any builder objects.
// Free the scratch buffer free(pScratchBuffer); // Free host version of acceleration structure free(pRtasBufferHost); // Destroy the builder object zeRTASBuilderDestroyExt(hBuilder); // Use the acceleration structure buffer with the ray tracing API { // ... } // Release the device acceleration structure buffer once it is no longer needed zeMemFree(hContext, pRtasBufferDevice);