.. index:: pair: page; Bfloat16 Training .. _doxid-dev_guide_training_bf16: Bfloat16 Training ================= :target:`doxid-dev_guide_training_bf16_1dev_guide_training_bf16_introduction` Introduction ~~~~~~~~~~~~ On the path to better performance, a recent proposal introduces the idea of working with a bfloat16 (``bf16``) 16-bit floating point data type based on the IEEE 32-bit single-precision floating point data type (``f32``). Both ``bf16`` and ``f32`` have an 8-bit exponent. However, while ``f32`` has a 23-bit mantissa, ``bf16`` has only a 7-bit one, keeping only the most significant bits. As a result, while these data types support a very close numerical range of values, ``bf16`` has a significantly reduced precision. Therefore, ``bf16`` occupies a spot between ``f32`` and the IEEE 16-bit half-precision floating point data type, ``f16``. Compared directly to ``f16``, which has a 5-bit exponent and a 10-bit mantissa, ``bf16`` trades increased range for reduced precision. .. image:: img_bf16_diagram.png :alt: Diagram depicting the bit-wise layout of f32, bf16, and f16 floating point data types. More details of the bfloat16 data type can be found at `Intel's site `__ and `TensorFlow's documentation `__. One of the advantages of using ``bf16`` versus ``f32`` is reduced memory footprint and, hence, increased memory access throughput. Additionally, when executing on hardware that supports `Intel DL Boost bfloat16 instructions `__, ``bf16`` may offer an increase in computational throughput. bfloat16 Support ~~~~~~~~~~~~~~~~ Most of the primitives have been updated to support the ``bf16`` data type for source and weights tensors. Destination tensors can be specified to have either the ``bf16`` or ``f32`` data type. The latter is intended for cases in which the output is to be fed to operations that do not support bfloat16 or require better precision. bfloat16 Workflow ~~~~~~~~~~~~~~~~~ The main difference between implementing training with the ``f32`` data type and with the ``bf16`` data type is the way the weights updates are treated. With the ``f32`` data type, the weights gradients have the same data type as the weights themselves. This is not necessarily the case with the ``bf16`` data type as oneDNN allows some flexibility here. For example, one could maintain a master copy of all the weights, computing weights gradients in ``f32`` and converting the result to ``bf16`` afterwards. Example ~~~~~~~ The :ref:`CNN bf16 training example ` shows how to use ``bf16`` to train CNNs.