FX tracing improvement by michaelbenayoun · Pull Request #14321 · huggingface/transformers

michaelbenayoun · 2021-11-08T10:53:43Z

What does this PR do?

This PR improves significantly the way transformers models are traced by the HFTracer (torch.fx).
This has 2 major consequences:

More model architectures can be supported
When a model can be traced, the resulting GraphModule can take any input shapes out of the box (compared to what was done before where a lot of work was needed to enable dynamic axes for a given model), this is both easier and less bug prone.

Because of these changes the symbolic_trace signature becomes easier:
symbolic_trace(model: PreTrainedModel, input_names: Optional[List[str]] = None) -> GraphModule

There is no need to specify the batch size, the sequence length or the number of choices (for multiple-choice) anymore.
The same thing can be said about the HFTracer, which can be instantiated exactly the same way as the regular torch.fx.Tracer can.

LysandreJik · 2021-11-09T16:17:13Z

Hey, thanks for your PR @michaelbenayoun ! It seems there are a few failing tests (1096 😄), could you take a look at it?

michaelbenayoun · 2021-11-09T16:19:09Z

Currently looking into it!
Sorry about that.

michaelbenayoun · 2021-11-09T16:49:37Z

Fixed!

sgugger

I'm not too comfortable with some of the changes in the models, especially XLNet, apart from that, the PR looks good.

In the tests, the fx_ready_model_classes seems to always be set to all_model_classes, so maybe it's time to use a boolean flag instead of a list of classes, if we always test all classes?

src/transformers/modeling_utils.py

src/transformers/models/xlnet/modeling_xlnet.py

tests/test_modeling_bart.py

tests/test_modeling_layoutlm.py

thomasw21

Haven't had the time to look in depth. I'll review more when I'll have some more bandwidth

src/transformers/modeling_utils.py

thomasw21 · 2021-11-09T15:34:17Z

src/transformers/modeling_utils.py

Suggested change

seq_ids = torch.arange(seq_length, device=device)

causal_mask = seq_ids[None, None, :].repeat(batch_size, seq_length, 1) <= seq_ids[None, :, None]

causal_mask = torch.tril(torch.ones(batch_size, seq_length, seq_length, dtype=torch.bool, device=device))

Unrelated to this PR, but constructing a triangular matrix should be a bit more simple IMO (unless I'm missing something) ...

Would be nice if we keep the code as is for now to make sure to not break anything here accidentally. Could you also run T5's and Bart's SLOW tests to be sure nothing is broken with the attention mask?

src/transformers/models/xlnet/modeling_xlnet.py

thomasw21 · 2021-11-09T16:24:42Z

src/transformers/utils/fx.py

This shouldn't be true no?

thomasw21 · 2021-11-09T16:26:12Z

src/transformers/utils/fx.py

Suggested change

return super().__len__(self)

return super().__len__(self.cache)

Shouldn't that be something along these lines?

thomasw21 · 2021-11-09T16:26:38Z

src/transformers/utils/fx.py

I'm not sure why this is here?

thomasw21 · 2021-11-09T16:28:36Z

src/transformers/utils/fx.py

I'm not sure to understand how that does what it says it does?

github-actions · 2021-12-08T15:01:40Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

michaelbenayoun · 2021-12-15T15:42:26Z

Unstale comment

… for the methods to record than what will be seen at tracing time (which would otherwise desynchronize the recorded values and the values that need to be given to the proxies during tracing, causing errors).

…ngth)

michaelbenayoun · 2022-01-10T15:46:26Z

I am planning to try another approach to make both the code easier, and the tracing process cleaner, this will allow to add other models as well as to limit the number of bugs.
In the mean time, I think this can be merged because a few issues were posted to have symbolic_trace working for Pytorch 1.10, which this PR enables.

michaelbenayoun · 2022-02-03T12:01:14Z

src/transformers/commands/add_new_model_like.py

            if "tokenization" not in str(f) and "processor" not in str(f) and "feature_extraction" not in str(f)
        ]

+    def disable_fx_test(filename: Path):


What do you think of this @sgugger ?

The reason I added that is because symbolic_trace checks the model class before trying to trace the model to make sure it is supported.
Because the tests are copied, if a new model is created from a supported model for symbolic tracing, the test file will contain something like fx_ready = True which will trigger the torch.fx tests, all of them failing because the model class is not in the list of the supported models.
I do not think this is a good approach to automatically add the new model class to the supported models because the model implementation can be changed, so I thought that disabling the test and printing some message was a better option.

Works for me!

sgugger · 2022-02-03T14:15:11Z

src/transformers/commands/add_new_model_like.py

+        with open(filename) as fp:
+            content = fp.read()
+        with open(filename, "w") as fp:
+            new_content = re.sub(r"fx_ready\s*=\s*True", "fx_ready = False", content)


Nit, this line should go before the second with.

LysandreJik

This looks good to me as long as it's 100% backwards compatible.

Pinging @patrickvonplaten and @patil-suraj for a quick look as it touches to a lot of different models.

LysandreJik · 2022-02-03T16:12:12Z

src/transformers/file_utils.py


 # This is the version of torch required to run torch.fx features and torch.onnx with dictionary inputs.
-TORCH_FX_REQUIRED_VERSION = version.parse("1.9")
+TORCH_FX_REQUIRED_VERSION = version.parse("1.10")


Out of curiosity, is it possible to support many different versions, or are there breaking changes in torch.fx that we have to support one version at a time?

I can check for torch 1.9, the plan from now on is to support torch 1.10 + as fx became stable starting at this version (still need to validate that with pytorch team).

Sure, sounds good to me

And you probably need to change this line from == to >=.

LysandreJik · 2022-02-03T16:28:44Z

src/transformers/commands/add_new_model_like.py

+        print(
+            "The tests for symbolic tracing with torch.fx were disabled, you can add those once symbolic tracing works "
+            "for your new model."
+        )


Ideally this would use the logger

I followed what was done in the script, but can definitely change that to logger if needed.

LysandreJik · 2022-02-03T16:33:00Z

src/transformers/models/gpt2/modeling_gpt2.py


        if self.scale_attn_weights:
-            attn_weights = attn_weights / (float(value.size(-1)) ** 0.5)
+            attn_weights = attn_weights / (value.size(-1) ** 0.5)


Is this backwards compatible?

In my opinion, this doesn't cause any problems.

When we do tracing, python values cause several problems.
I don't think there is any reason to change this value to a Python value.

This change seems to cause the fail on mixed-precision training gpt-2 with ONNX Runtime backend. Link to the reported issue #11279.

patil-suraj

Went through all the modeling changes and it looks good to me!

patrickvonplaten · 2022-02-07T17:15:45Z

src/transformers/models/gpt2/modeling_gpt2.py

                )

-        pooled_logits = logits[range(batch_size), sequence_lengths]
+        pooled_logits = logits[torch.arange(batch_size), sequence_lengths]


Suggested change

pooled_logits = logits[torch.arange(batch_size), sequence_lengths]

pooled_logits = logits[torch.arange(batch_size, device=self.device), sequence_lengths]

We need to make sure the tensor is on the same device no?

patrickvonplaten · 2022-02-07T17:15:52Z

src/transformers/models/gptj/modeling_gptj.py

                )

-        pooled_logits = logits[range(batch_size), sequence_lengths]
+        pooled_logits = logits[torch.arange(batch_size), sequence_lengths]


patrickvonplaten · 2022-02-07T17:16:29Z

tests/test_modeling_bert.py

    all_generative_model_classes = (BertLMHeadModel,) if is_torch_available() else ()
-    fx_ready_model_classes = all_model_classes
-    fx_dynamic_ready_model_classes = all_model_classes
+    fx_ready = True


(nit) not a huge fan of the name fx_ready - does that mean fx_compatible?

patrickvonplaten

Left some comments, but in general this looks good to me as well

* Change the way tracing happens, enabling dynamic axes out of the box * Update the tests and modeling xlnet * Add the non recoding of leaf modules to avoid recording more values for the methods to record than what will be seen at tracing time (which would otherwise desynchronize the recorded values and the values that need to be given to the proxies during tracing, causing errors). * Comments and making tracing work for gpt-j and xlnet * Refactore things related to num_choices (and batch_size, sequence_length) * Update fx to work on PyTorch 1.10 * Postpone autowrap_function feature usage for later * Add copyrights * Remove unnecessary file * Fix issue with add_new_model_like * Apply suggestions

michaelbenayoun force-pushed the fx_tracing_enhancement branch from 1f08935 to 474aa54 Compare November 8, 2021 11:04

michaelbenayoun marked this pull request as ready for review November 9, 2021 15:20

michaelbenayoun changed the title ~~Fx tracing enhancement~~ FX tracing improvement Nov 9, 2021

michaelbenayoun requested review from LysandreJik and sgugger and removed request for sgugger November 9, 2021 15:21

sgugger reviewed Nov 10, 2021

View reviewed changes

thomasw21 reviewed Nov 10, 2021

View reviewed changes

michaelbenayoun mentioned this pull request Dec 8, 2021

Dynamic Inputs for fx traced GPTNeoLM #14633

Closed

michaelbenayoun added 11 commits January 5, 2022 15:17

changed the way tracing happens, enabling dynamic axes out of the box

0788780

Updated the tests and modeling xlnet

f730608

Added the non recoding of leaf modules to avoid recording more values…

a146048

… for the methods to record than what will be seen at tracing time (which would otherwise desynchronize the recorded values and the values that need to be given to the proxies during tracing, causing errors).

Comments and making tracing work for gpt-j and xlnet

aef4d00

Refactored things related to num_choices (and batch_size, sequence_le…

f7a69eb

…ngth)

style fix

b0e3d96

Updated fx to work on PyTorch 1.10

636099b

Postponed autowrap_function feature usage for later

505333d

style fix

74f74d7

fixed issue

e171ed2

implemented suggestions

6faf263

michaelbenayoun mentioned this pull request Jan 6, 2022

[FX] symbolic_trace yields a TraceError for BertModel #15045

Closed

Add copyrights

83aedfc

michaelbenayoun force-pushed the fx_tracing_enhancement branch from 5d694b0 to 83aedfc Compare January 6, 2022 10:09

michaelbenayoun mentioned this pull request Jan 6, 2022

to use FX - your torch version must be *exactly* 1.9, even though fx also works in later versions #14632

Closed

Remove unnecessary file

ee75d02

Merge branch 'master' into fx_tracing_enhancement

bdb1ef8

hyunwoongko mentioned this pull request Feb 2, 2022

Isn't transformers.utils.fx compatible with torch 1.10+ ? #15483

Closed

michaelbenayoun added 2 commits February 3, 2022 10:53

Fix copies

4f22de3

Fix issue with add_new_model_like

e566f16

michaelbenayoun commented Feb 3, 2022

View reviewed changes

Fix issue with add_new_model_like

a90d319

sgugger reviewed Feb 3, 2022

View reviewed changes

Apply suggestions

ae60baf

LysandreJik approved these changes Feb 3, 2022

View reviewed changes

patil-suraj reviewed Feb 4, 2022

View reviewed changes

patrickvonplaten reviewed Feb 7, 2022

View reviewed changes

Apply suggestions

9ef5813

michaelbenayoun merged commit 0fe17f3 into huggingface:master Feb 7, 2022

This was referenced Jul 4, 2022

Add CLM training example huggingface/optimum#248

Merged

Fix ORTTrainer failure on gpt2 fp16 training #18017

Merged

	seq_ids = torch.arange(seq_length, device=device)
	causal_mask = seq_ids[None, None, :].repeat(batch_size, seq_length, 1) <= seq_ids[None, :, None]
	causal_mask = torch.tril(torch.ones(batch_size, seq_length, seq_length, dtype=torch.bool, device=device))

	return super().__len__(self)
	return super().__len__(self.cache)

	pooled_logits = logits[torch.arange(batch_size), sequence_lengths]
	pooled_logits = logits[torch.arange(batch_size, device=self.device), sequence_lengths]

Conversation

michaelbenayoun commented Nov 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

LysandreJik commented Nov 9, 2021

Uh oh!

michaelbenayoun commented Nov 9, 2021

Uh oh!

michaelbenayoun commented Nov 9, 2021

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thomasw21 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 8, 2021

Uh oh!

michaelbenayoun commented Dec 15, 2021

Uh oh!

michaelbenayoun commented Jan 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JingyaHuang Jul 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patil-suraj left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michaelbenayoun commented Nov 8, 2021 •

edited

Loading

thomasw21 left a comment •

edited

Loading

michaelbenayoun commented Jan 10, 2022 •

edited

Loading

JingyaHuang Jul 4, 2022 •

edited

Loading