Guard torch.compile on ROCm when triton_key is missing#3923
Conversation
Summary of ChangesHello @hnxnq7, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request implements a targeted solution to prevent crashes in ROCm environments when using Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a safeguard for ROCm environments where an incomplete Triton installation (missing triton_key) can cause torch.compile to crash. The fix correctly identifies this scenario and disables torch.compile and Inductor by setting environment variables, which is a clean and effective approach. The changes are well-isolated. My review includes a suggestion to improve the robustness of the import checks in the new fix_rocm_triton_key_error function by using more specific exception handling.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
The function (introduced in unslothai#3923) assumed that the absence of `triton.runtime.triton_key` on ROCm means torch.compile will crash. Investigation shows this is incorrect: 1. `triton.runtime.triton_key` was renamed/removed in the ROCm Triton fork — it does not exist at that path. However, `triton.compiler.compiler.triton_key` (the path torch._inductor actually imports) EXISTS and works correctly on ROCm. 2. Both call-sites in torch._inductor (codecache.py and async_compile.py) already wrap the import in try/except, so even a genuinely missing triton_key would be handled gracefully. 3. Comprehensive testing on ROCm 7.1 + Triton 3.4.0 + gfx1100 confirms torch.compile works correctly for matmul, cross-entropy, RMSNorm, multi-layer transformer forward+backward, and LoRA — all without triton.runtime.triton_key. The original code was also ineffective (environment variables set after torch import have no effect on torch._dynamo config), so removing it has zero behavioral change on existing installations. Supersedes the compile-disable portion of unslothai#3923.
unslothai#4125) The function (introduced in unslothai#3923) assumed that the absence of `triton.runtime.triton_key` on ROCm means torch.compile will crash. Investigation shows this is incorrect: 1. `triton.runtime.triton_key` was renamed/removed in the ROCm Triton fork — it does not exist at that path. However, `triton.compiler.compiler.triton_key` (the path torch._inductor actually imports) EXISTS and works correctly on ROCm. 2. Both call-sites in torch._inductor (codecache.py and async_compile.py) already wrap the import in try/except, so even a genuinely missing triton_key would be handled gracefully. 3. Comprehensive testing on ROCm 7.1 + Triton 3.4.0 + gfx1100 confirms torch.compile works correctly for matmul, cross-entropy, RMSNorm, multi-layer transformer forward+backward, and LoRA — all without triton.runtime.triton_key. The original code was also ineffective (environment variables set after torch import have no effect on torch._dynamo config), so removing it has zero behavioral change on existing installations. Supersedes the compile-disable portion of unslothai#3923.
* Guard torch.compile on ROCm when triton_key missing * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update unsloth/import_fixes.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Tighten ROCm Triton import handling * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Rachel Li <rachelliqx07@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
unslothai#4125) The function (introduced in unslothai#3923) assumed that the absence of `triton.runtime.triton_key` on ROCm means torch.compile will crash. Investigation shows this is incorrect: 1. `triton.runtime.triton_key` was renamed/removed in the ROCm Triton fork — it does not exist at that path. However, `triton.compiler.compiler.triton_key` (the path torch._inductor actually imports) EXISTS and works correctly on ROCm. 2. Both call-sites in torch._inductor (codecache.py and async_compile.py) already wrap the import in try/except, so even a genuinely missing triton_key would be handled gracefully. 3. Comprehensive testing on ROCm 7.1 + Triton 3.4.0 + gfx1100 confirms torch.compile works correctly for matmul, cross-entropy, RMSNorm, multi-layer transformer forward+backward, and LoRA — all without triton.runtime.triton_key. The original code was also ineffective (environment variables set after torch import have no effect on torch._dynamo config), so removing it has zero behavioral change on existing installations. Supersedes the compile-disable portion of unslothai#3923.
Summary
Changes
Behavior