[benchmarks] Default to bfloat16 (inference) and AMP (training) precision.#6518
[benchmarks] Default to bfloat16 (inference) and AMP (training) precision.#6518
bfloat16 (inference) and AMP (training) precision.#6518Conversation
This comment was marked as outdated.
This comment was marked as outdated.
- Pick data type based on `test`. - Create `cast_to_dtype` function.
f83895a to
4864f13
Compare
|
I'm still running the benchmarks in order to grasp the regressions this PR introduces. |
| "hf_T5_generate", | ||
| } | ||
|
|
||
| FORCE_AMP_FOR_FP16_BF16_MODELS = { |
There was a problem hiding this comment.
Do you know how this is configured on PyTorch HUD? Maintaining a list like this feels prone to divergence.
There was a problem hiding this comment.
These lists are taken from the scripts in the PyTorch main repo. They are used for generating the PyTorch HUD results.
There was a problem hiding this comment.
I will leave a comment.
There was a problem hiding this comment.
Could you share the link to the scripts in the PyTorch main repo that are used to generate the HUD result?
There was a problem hiding this comment.
| "hf_T5_generate", | ||
| } | ||
|
|
||
| FORCE_AMP_FOR_FP16_BF16_MODELS = { |
golechwierowicz
left a comment
There was a problem hiding this comment.
Can you post BERT_pytorch kernel profile after the change?
|
The profiling results (in the following posts) were generated with the following command: python xla/benchmarks/experiment_runner.py \
--suite-name torchbench --accelerator cuda --dump-pytorch-profiles \
--xla PJRT --dynamo openxla --test eval --repeat 8 --iterations-per-run 1 \
-k BERT_pytorch |
BERT_pytorch (before) |
BERT_pytorch (after) |
|
Thank you! Looks good to push forward. |
Fix: #6483
This PR makes
bfloat16the default data-type for inference, and AMP the default execution mode for training. This follows the execution found in the PyTorch HUD.cc @miladm