Even if BFLOAT16 is enabled, unless the communication datatype is also explicitly set to bfp16, fp16 is still used: https://github.com/microsoft/DeepSpeed/blob/9b70ce56e7af89d5226f9b06ebe1137407f371dc/deepspeed/runtime/engine.py#L702-L711 Moreover in zero, even if the communication datatype is set to bfp16 at some place the default value is fp16 https://github.com/microsoft/DeepSpeed/blob/9b70ce56e7af89d5226f9b06ebe1137407f371dc/deepspeed/runtime/zero/stage_1_and_2.py#L1334-L1338