Use codegen for the boxed interpreters#164573
Use codegen for the boxed interpreters#164573ezyang wants to merge 4 commits intogh/ezyang/3170/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164573
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ⏳ No Failures, 1 PendingAs of commit 3c7374a with merge base 801e282 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
albanD
left a comment
There was a problem hiding this comment.
Small nit,
I think the input unpacking will be terrible one way or the other haha
Sounds good if perf is actually where you need it to be!
torch/fx/graph.py
Outdated
|
|
||
|
|
||
| __all__ = ["PythonCode", "CodeGen", "Graph"] | ||
| __all__ = ["PythonCode", "CodeGen", "Graph", "_BoxedCodeGen"] |
There was a problem hiding this comment.
You can't put private API in there.
You can just remove this as it doesn't do anything.
There was a problem hiding this comment.
I BLAME CLAUDE sorry for not reviewing carefully enough lol
torch/fx/graph.py
Outdated
| def process_inputs(self, args_list: Any) -> Any: | ||
| """ | ||
| Process boxed inputs. Expects a list of arguments. | ||
| """ | ||
| return args_list | ||
|
|
||
| def process_outputs(self, out: Any) -> Any: | ||
| """ | ||
| Process outputs - just return as-is for boxed calling convention. | ||
| """ | ||
| return out |
torch/fx/graph.py
Outdated
| # Generate the function signature with args_list parameter | ||
| fn_def = f"def {self._func_name}(self, args_list){maybe_return_annotation}:" | ||
|
|
||
| if len(free_vars) > 0: |
There was a problem hiding this comment.
empty containers are falsey, non-empty containers are truthy
(why don't people ever rely on that anymore?)
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 5, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu) Details for Dev Infra teamRaised by workflow job |
|
test fail is real |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot merge |
|
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64 / test (default, 1, 3, macos-m1-stable) Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…'t have decomposed (#164939) This fixes AOTAutograd rms_norm not being bitwise equivalent to eager, because it avoids a decomposition. You can force the decomposition by having the decomposition in the dispatch table, but if eager mode wouldn't have decomposed (because it went to the fused one), we now default to preserving the fused call by default. This largely reverts #103275 for view ops. This means that in inference mode we could hit the wrong C++ kernel; if this occurs we should just SymInt'ify the C++ kernel. Another neat side effect of this change is that Inductor's generated kernels for rms_norm now have rms_norm in their name. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: #164939 Approved by: https://github.com/bdhirsh ghstack dependencies: #164573
Authored with claude code. The arg parsing is kind of horrible, open to more suggestions. Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: pytorch#164573 Approved by: https://github.com/albanD, https://github.com/jansel
…'t have decomposed (pytorch#164939) This fixes AOTAutograd rms_norm not being bitwise equivalent to eager, because it avoids a decomposition. You can force the decomposition by having the decomposition in the dispatch table, but if eager mode wouldn't have decomposed (because it went to the fused one), we now default to preserving the fused call by default. This largely reverts pytorch#103275 for view ops. This means that in inference mode we could hit the wrong C++ kernel; if this occurs we should just SymInt'ify the C++ kernel. Another neat side effect of this change is that Inductor's generated kernels for rms_norm now have rms_norm in their name. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: pytorch#164939 Approved by: https://github.com/bdhirsh ghstack dependencies: pytorch#164573
Stack from ghstack (oldest at bottom):
Authored with claude code. The arg parsing is kind of horrible, open
to more suggestions.
Signed-off-by: Edward Yang ezyang@meta.com
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @Lucaskabela