Fix gpt2 fp16 training when tracing is enabled#20656
Fix gpt2 fp16 training when tracing is enabled#20656sgugger merged 3 commits intohuggingface:mainfrom
Conversation
|
A little bit more context on the issue, I previously fixed the tracing issue in #18017, but it will harm the performance due to host<->device synchronization, which has been targeted in #20061, but cause the tracing once again failed. It seems that we can't guarantee the tracing correctness and inference performance with the same line of code while using PyTorch at the same time, that's why in the PR, I distinguish two cases to solve it:
|
|
Also @michaelbenayoun I saw this: #18017 (comment), does the current modeling won't have an issue while doing mixed-precision training for torch.fx? |
sgugger
left a comment
There was a problem hiding this comment.
This is the kind of if/else we try to avoid in the modeling code as it will become completely unreadable if we add support for all optimizations/exports like this. Let's forego the optimized path here and only do what works for ONNX/tracing.
|
Feel the same, If/else removed! |
|
The documentation is not available anymore as the PR was closed or merged. |
sgugger
left a comment
There was a problem hiding this comment.
Thanks! Let's just wait for @michaelbenayoun and then we can merge!
* ONNX tracing fix * Remove conditional
What does this PR do?
With the PR #20061, the tracing will fail during mixed-precision training, as the dtype for the inputs of a where node are not the same, which is invalid while reusing the ONNX model for inference.
The node:
transformers/src/transformers/models/gpt2/modeling_gpt2.py
Line 201 in 3ac040b
Error message: