Intel(R) GPU Occupancy Calculator

Occupancy Calculator

Select your GPU and input your GPU kernel parameters

Select GPU:

Valid Work-Group Sizes

Occupancy Configuration

This table shows ideal combination of Work-Group(WG) size, Sub-Group size and Shared Local Memory(SLM) usage limit that will get you 100% occupancy in the selected GPU Xe-Core.

HELP

What is this?

The Intel GPU Occupancy tool will compute the theoretical GPU Occupancy for Intel GPU devices based on GPU Kernel code parameters. The Occupancy value will determine whether the GPU is efficiently utilizing all the GPU hardware resources. The tool will also generate graphs that will help optimize GPU Occupancy.

How to use?

Input GPU Device type
Input GPU Kernel code parameters
Analyze GPU Occupancy output values and graphs

Detailed Instructions

Input GPU Device type
- From the dropdown, select the GPU family and change "XVE Count" for your GPU. (XVE: Number of Xe Vector Engines in the GPU).To find out XVE Count for your GPU, check the GPU hardware specification or run clinfo in terminal and look for "Max compute units" value.
- Alternatively, you can select the option to use enter "PCI ID of GPU Device" to find the your GPU
  - Run lspci | grep Display in terminal to find your GPU devices and PCI IDs.
  - Find your GPU name and find the corresponding PCI ID from this Intel GPU Harware List
  - Note: If your GPU is not found using PCI ID, please report a bug and use the other method by selecting your GPU family.
Input GPU Kernel code parameters
Once the GPU is selected, the tool will load some default values for GPU Kernel code's Global size, Work-Group size, Sub-Group size and Local Memory size. It will then compute GPU Occupancy values and will generate graphs.
Change the values of input parameters based on your GPU Kernel code to recalculate the Occupancy:
- Global size: Workload size of GPU Kernel or Total number of work-items that is executed by the GPU Kernel code. Change this to your Kernel Workload size. The value can be 1,2 or 3 dimensional value.
  Examples: 4096 | 1024,1024 | 256,256,256
- Work-Group size: Group of work-items that execute within an Xe-Core and is set in GPU kernel code using sycl::nd_range. Setting this value too low will result in dispatching too many Work-Groups to GPU. This will result in dispatch and synchronization overhead which may affect performance negatively.
- Sub-Group size: Compiler automatically sets the SIMD/sub-group size, unless explicitly set in GPU Kernel code using [[intel::reqd_sub_group_size(SIMD_SIZE)]] [More Details]
- Local Memory size: Total local memory allocated by GPU Kernel code using sycl::local_accessor. Local Memory is shared by all Work-Groups that are dispatched to an Xe-Core. Allocating too much local memory may limit the number of Work-Groups that can be executed concurrently on Xe-Core. This may reduce Xe-Core Occupancy, however performance may or may not be negatively affected.
- Barrier Usage: Select "YES" if kernel code uses sycl::group_barrier either explicitly or implicitly within a group algorithm or library function. The GPU has a set number of barrier registers per Xe-Core. The number of Work-Groups that can be execute concurrently on an Xe-Core may be limited if GPU Kernel code has barriers.
Analyze GPU Occupancy output values and graphs
Focus on optimizing the Xe-Core Occupancy value by changing the input parameters of GPU kernel code. The Peak and Average GPU Occupancy values just scales based on the number of Xe-Cores in the GPU and global size of workload.
- Xe-Core Occupancy: Theoretical Occupancy of GPU's Xe-Core is determined by usage of available thread contexts in an Xe-Core. A Work-Group is dispatched to Xe-Core on GPU, multiple Work-Groups can be dispatched to the same Xe-Core if there are sufficient resources. Parameters affecting Xe-Core occupancy are work-group size, SIMD sub-group size and local memory usage. [More Details]
- GPU Occupancy: Theoretical Occupancy of GPU hardware to execute the entire GPU Kernel workload on all Xe-Cores of the GPU. This depends on the Xe-Core Occupancy, number of Xe-Cores in the GPU and the workload size. Depending on the GPU Kernel workload size, the GPU may require multiple waves of execution. The Peak GPU Occupancy value and Average GPU Occupancy value are calculated.
- Graph - Impact of varying Work-Group: This Graph shows how the Xe-Core Occupancy varies for different Work-Group sizes with all other parameters constant. This information can be used to set appropriate Work-Group size in GPU Kernel code to maximize Occupancy.
- Graph - Impact of varying Local Memory: This Graph shows how the Xe-Core Occupancy varies for different Local Memory sizes with all other parameters constant. This information can be used to allocate appropriate amount Local Memory in GPU Kernel code to maximize Occupancy.
- Graph - GPU Occupancy: This Graph shows how the GPU Kernel executes the entire workload. The execution may happen in multiple waves depending on number of Work-Groups and available GPU resources. The graph will show Peak GPU Occupancy value and Average GPU Occupancy value.

Accronyms/Definitions

GPU	Graphics Processing Unit
HW	Hardware
Xe	Intel GPU Architecture name
XVE	Xe Vector Engine (formaly known as EU or Execution Unit) Smallest thread level building block of GPU consisting of Arithmetic Logic Units
SLM	Shared Local Memory
Xe-Core	Compute building block of GPU consisting of XVEs, SLM and other HW resources
SYCL	Open Source Cross-Platform Programming Model
WG	Work-Group
SG	SIMD/Sub-Group

Disclaimer

The GPU Occupancy Calculator gives a theoretical estimate of GPU Occupancy, actual occupancy on the hardware may be slightly different due to many factors like kernel scheduling variations, cache hit/miss and other hardware conditions.

Note that higher occupancy does not always translate to higher performance (for example in some GPU kernels, higher local memory usage may reduce occupancy but may give better performance).

Occupancy reported by tools like Intel VTune Profiler and Intel Advisor Roofline are actual Occupancy and may be different than the theoretical GPU Occupancy Calculator value.

Report Bug or Feature Request

Intel® GPU Occupancy Calculator

What is this?

How to use?

Accronyms/Definitions

Disclaimer