Constant Tensor Cache

The oneDNN Graph component supports the constant tensor cache feature, which is used to cache processed constant tensors such as reordered constant weights and folded constant scales to reduce redundant computation and improve performance. The feature is disabled by default. Users can use the graph API or environment variable to set or get specific cache capacity for different engine kinds (CPU and GPU).

Build-Time Controls

Build-time controls to enable or disable the constant tensor cache feature are not supported. Only run-time controls through the graph API or environment variables are supported. Refer to the following section.

Run-Time Controls

Constant Tensor Cache Capacity Control API

oneDNN Graph provides users with a pair of APIs to control the constant tensor cache feature. To enable the constant tensor cache and set the capacity to a specific engine kind, call the setter API. The unit of setter capacity API is megabytes (MB). New tensors won’t be cached when capacity is reached. To query the current capacity for a specific engine kind, call the getter API.

// setter API
@ref dnnl_graph_set_constant_tensor_cache_capacity

// getter API
@ref dnnl_graph_get_constant_tensor_cache_capacity

Environment Variable

In addition to a programmable API, oneDNN Graph also provides users with an environment variable named ONEDNN_GRAPH_CONSTANT_TENSOR_CACHE_CAPACITY to control the capacity. It accepts values in the form engine_kind:size or engine_kind1:size1;engine_kind2:size2. The first example below means the user can set capacity for one engine kind (cpu). The second example is that the capacity of cpu and gpu are set to 1024 MB and 2048 MB separately.

Environment variable





Set cpu constant cache capacity size to size1 and gpu to size2



The environment variable API should be set only once before the application starts; the library will read the variable and cache it inside to reduce string parsing overhead. Re-setting the environment variable at runtime will not take effect. Functional APIs have higher priority than environment variables. If users call the functional APIs, it will overwrite the capacity values specified through the environment variable.