Skip to content

[dynamo] disable dynamo recursively on compiled code if fullgraph=True#172295

Closed
williamwen42 wants to merge 3 commits intogh/williamwen42/379/basefrom
gh/williamwen42/379/head
Closed

[dynamo] disable dynamo recursively on compiled code if fullgraph=True#172295
williamwen42 wants to merge 3 commits intogh/williamwen42/379/basefrom
gh/williamwen42/379/head

Conversation

@williamwen42
Copy link
Member

@williamwen42 williamwen42 commented Jan 13, 2026

Stack from ghstack (oldest at bottom):

I tried a few 1-line changes but none were sufficient to get the behavior we wanted. The issue is that we want to control the recursive eval_frame behavior depending on current top-level compilation call (i.e. we need different recursive_action for each CacheEntry). This was not supported, so we implement per-CacheEntry recursive_action here.

There are some rules we introduced for dealing with ExtraState (code-level) vs. CacheEntry (frame-level) recursive_action:

  1. If the current (NOT recursive) action for the code object (i.e. ExtraState) is SKIP, then use ExtraState's recursive_action. This is because we don't even perform a cache lookup, so there is no CacheEntry recursive_action to use.
  2. Otherwise, we will get a CacheEntry. If it has a recursive_action, use that.
  3. Otherwise, default to the ExtraState's recursive_action.

This is potentially BC-breaking since torch.compile will not be invoked again on code compiled with fullgraph=True, but it is unlikely for anyone to really be depending on this behavior (see the added test, which does some really weird behavior in order to invoke a nested torch.compile when fullgraph=True).

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @chauhang @amjames @Lucaskabela @jataylo @mlazos

@pytorch-bot
Copy link

pytorch-bot bot commented Jan 13, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/172295

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ad2a003 with merge base 8f775e9 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

williamwen42 added a commit that referenced this pull request Jan 13, 2026
@williamwen42 williamwen42 added the topic: not user facing topic category label Jan 13, 2026
@williamwen42 williamwen42 requested review from ezyang and removed request for ezyang January 13, 2026 01:54
@williamwen42 williamwen42 marked this pull request as draft January 13, 2026 21:10
…llgraph=True"

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo

[ghstack-poisoned]
williamwen42 added a commit that referenced this pull request Jan 16, 2026
@williamwen42 williamwen42 marked this pull request as ready for review January 16, 2026 18:43
…llgraph=True"

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo

[ghstack-poisoned]
@ezyang
Copy link
Contributor

ezyang commented Jan 20, 2026

I am wondering if we can fix this more simply by simply bypassing the eval frame handler entirely for fullgraph. We need the eval frame handler to conveniently reenter Dynamo when graph break occurs, but this definitionally cannot happen for fullgraph. So an eval frame handler isn't necessary; we can jump straight to the cache lookup code. Perhaps... fullgraph should also imply you never go to eager? (Not sure what you're going to do when you hit the recompile limit; it's not clear what you do in this PR either.)

@williamwen42
Copy link
Member Author

Recompile limit in fullgraph=True would result in an error being raised in convert_frame.py (no frame actions set).

I needed to do this per-frame action thing for the case that we wrap the same code object with both a torch.compile(fullgraph=False) and torch.compile(fullgraph=True) decorator at different points in the same scope. Testing files that wrap over external_utils.wrap_inline are prone to doing this.

I could possibly simplify this handling by introducing a new fullgraph=True eval frame handler that will not set a custom eval frame handler recursively - we could fallback to eager or even error out if we're not expecting to handle additional frames. This simplifies logic since fullgraph setting is determined in eval_frame.py. In the current implementation, we read the fullgraph value in convert_frame.py and we need to "tell" eval_frame_cpp.cpp what the fullgraph setting was.

@ezyang
Copy link
Contributor

ezyang commented Jan 20, 2026

I needed to do this per-frame action thing for the case that we wrap the same code object with both a torch.compile(fullgraph=False) and torch.compile(fullgraph=True) decorator at different points in the same scope.

Blah, this is annoying.

@ezyang ezyang requested a review from anijain2305 January 20, 2026 15:01
@ezyang
Copy link
Contributor

ezyang commented Jan 20, 2026

Don't block on me, I'm going to rely on Animesh's judgment on this

williamwen42 added a commit that referenced this pull request Jan 22, 2026
…llgraph=True using eval frame overrides"


This is attempt #2 at #172295. 

Instead of using frame/CacheEntry recursive actions, we override the recursive eval_frame callback in eval_frame_cpp.cpp if we run custom code. The override is set in eval_frame.py only if fullgraph=True. Right now, the possible overrides are:
- skip tracing frames
- error if tracing is successful. We can't just error on entry since convert_frame might end up skipping the frame, which should be allowed. It might be better to simply patch the entrypoint of tracing e.g. InstructionTranslatorBase, with an error though.
Currently, the default behavior is the latter, since we should loudly fail if torch.compile gets unintentionally re-invoked when fullgraph=True.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo mlazos

[ghstack-poisoned]
williamwen42 added a commit that referenced this pull request Jan 22, 2026
…llgraph=True using eval frame overrides"


This is attempt #2 at #172295. 

Instead of using frame/CacheEntry recursive actions, we override the recursive eval_frame callback in eval_frame_cpp.cpp if we run custom code. The override is set in eval_frame.py only if fullgraph=True. Right now, the possible overrides are:
- skip tracing frames
- error if tracing is successful. We can't just error on entry since convert_frame might end up skipping the frame, which should be allowed. It might be better to simply patch the entrypoint of tracing e.g. InstructionTranslatorBase, with an error though.
Currently, the default behavior is the latter, since we should loudly fail if torch.compile gets unintentionally re-invoked when fullgraph=True.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo mlazos

[ghstack-poisoned]
williamwen42 added a commit that referenced this pull request Jan 23, 2026
…llgraph=True using eval frame overrides"


This is attempt #2 at #172295. 

Instead of using frame/CacheEntry recursive actions, we override the recursive eval_frame callback in eval_frame_cpp.cpp if we run custom code. The override is set in eval_frame.py only if fullgraph=True. Right now, the possible overrides are:
- skip tracing frames
- error if tracing is successful. We can't just error on entry since convert_frame might end up skipping the frame, which should be allowed. It might be better to simply patch the entrypoint of tracing e.g. InstructionTranslatorBase, with an error though.
Currently, the default behavior is the latter, since we should loudly fail if torch.compile gets unintentionally re-invoked when fullgraph=True.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo mlazos

[ghstack-poisoned]
@williamwen42
Copy link
Member Author

We are preferring the 2nd approach

williamwen42 added a commit that referenced this pull request Jan 28, 2026
…piled code if fullgraph=True using eval frame overrides"


This is attempt #2 at #172295. 

Instead of using frame/CacheEntry recursive actions, we override the recursive eval_frame callback in eval_frame_cpp.cpp if we run custom code. The override is set in eval_frame.py only if fullgraph=True. Right now, the possible overrides are:
- skip tracing frames
- error if tracing is successful. We can't just error on entry since convert_frame might end up skipping the frame, which should be allowed. It might be better to simply patch the entrypoint of tracing e.g. InstructionTranslatorBase, with an error though.
Currently, the default behavior is the latter, since we should loudly fail if torch.compile gets unintentionally re-invoked when fullgraph=True.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo mlazos

[ghstack-poisoned]
williamwen42 added a commit that referenced this pull request Jan 28, 2026
…llgraph=True using eval frame overrides"


This is attempt #2 at #172295. 

Instead of using frame/CacheEntry recursive actions, we override the recursive eval_frame callback in eval_frame_cpp.cpp if we run custom code. The override is set in eval_frame.py only if fullgraph=True. Right now, the possible overrides are:
- skip tracing frames
- error if tracing is successful. We can't just error on entry since convert_frame might end up skipping the frame, which should be allowed. It might be better to simply patch the entrypoint of tracing e.g. InstructionTranslatorBase, with an error though.
Currently, the default behavior is the latter, since we should loudly fail if torch.compile gets unintentionally re-invoked when fullgraph=True.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo mlazos

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Feb 2, 2026
…e using eval frame overrides (#173080)

This is attempt #2 at #172295.

Instead of using frame/CacheEntry recursive actions, we override the recursive eval_frame callback in eval_frame_cpp.cpp if we run custom code. The override is set in eval_frame.py only if fullgraph=True. Right now, the possible overrides are:
- skip tracing frames
- error if tracing is successful. We can't just error on entry since convert_frame might end up skipping the frame, which should be allowed. It might be better to simply patch the entrypoint of tracing e.g. InstructionTranslatorBase, with an error though.
Currently, the default behavior is the latter, since we should loudly fail if torch.compile gets unintentionally re-invoked when fullgraph=True.

Pull Request resolved: #173080
Approved by: https://github.com/anijain2305
radeksm pushed a commit to radeksm/pytorch that referenced this pull request Feb 20, 2026
…e using eval frame overrides (pytorch#173080)

This is attempt pytorch#2 at pytorch#172295.

Instead of using frame/CacheEntry recursive actions, we override the recursive eval_frame callback in eval_frame_cpp.cpp if we run custom code. The override is set in eval_frame.py only if fullgraph=True. Right now, the possible overrides are:
- skip tracing frames
- error if tracing is successful. We can't just error on entry since convert_frame might end up skipping the frame, which should be allowed. It might be better to simply patch the entrypoint of tracing e.g. InstructionTranslatorBase, with an error though.
Currently, the default behavior is the latter, since we should loudly fail if torch.compile gets unintentionally re-invoked when fullgraph=True.

Pull Request resolved: pytorch#173080
Approved by: https://github.com/anijain2305
@github-actions github-actions bot deleted the gh/williamwen42/379/head branch February 28, 2026 02:20
libohao1201 pushed a commit to libohao1201/pytorch that referenced this pull request Mar 2, 2026
…e using eval frame overrides (pytorch#173080)

This is attempt pytorch#2 at pytorch#172295.

Instead of using frame/CacheEntry recursive actions, we override the recursive eval_frame callback in eval_frame_cpp.cpp if we run custom code. The override is set in eval_frame.py only if fullgraph=True. Right now, the possible overrides are:
- skip tracing frames
- error if tracing is successful. We can't just error on entry since convert_frame might end up skipping the frame, which should be allowed. It might be better to simply patch the entrypoint of tracing e.g. InstructionTranslatorBase, with an error though.
Currently, the default behavior is the latter, since we should loudly fail if torch.compile gets unintentionally re-invoked when fullgraph=True.

Pull Request resolved: pytorch#173080
Approved by: https://github.com/anijain2305
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants