Fix gpt temporary patch for grpo to happen after compile by mmathew23 · Pull Request #536 · unslothai/unsloth-zoo

mmathew23 · 2026-03-08T04:02:01Z

The gpt oss grpo patch in temporary_patches runs 3 times. Since it runs before compiler, compiler doesnt match the causallm pattern so we lose the connection to unsloth fused loss. This PR proposes phases for temporary patches so patches can determine when they want to run.

Currently the way it's setup it will default to the current behavior unless the corresponding unsloth pr is merged as well.

gemini-code-assist · 2026-03-08T04:02:13Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue where a GPT OSS GRPO patch was executing prematurely, hindering the compiler's ability to apply crucial fused loss optimizations. By implementing a new phasing mechanism, the patch can now be precisely timed to run after compilation, ensuring compatibility with compiler optimizations and maintaining efficient memory usage during training.

Highlights

Phased Patching: Introduced a phase argument to the patch_gpt_oss_for_grpo function, allowing the patch to be conditionally applied based on the specified execution phase.
Post-Compile Execution: Ensured the GPT OSS GRPO patch only runs post_compile to allow the compiler to correctly pattern-match cross-entropy and apply fused loss, preventing potential out-of-memory issues.
Prevent Duplicate Patching: Added a mechanism to check if GptOssForCausalLM has already been patched for GRPO, preventing the patch from being applied multiple times.

Changelog

unsloth_zoo/temporary_patches/gpt_oss.py
- Modified the patch_gpt_oss_for_grpo function signature to accept a phase argument, defaulting to "post_compile".
- Implemented a conditional return at the beginning of patch_gpt_oss_for_grpo to ensure the patch only proceeds if the current phase matches "post_compile".
- Added a check using hasattr to prevent GptOssForCausalLM from being patched for GRPO more than once.
- Set a _unsloth_grpo_patched attribute on GptOssForCausalLM to True after the GRPO patch is successfully applied.

Activity

No specific activity (comments, reviews, progress) has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a mechanism to control the timing of the gpt_oss GRPO patch application by introducing a phase parameter. The patch is now set to run only during the post_compile phase, which allows the compiler to perform optimizations like fused loss before the model's forward method is altered. Additionally, an idempotency check has been added to prevent the patch from being applied multiple times, addressing an issue where it was reportedly running three times. The changes are clear, well-implemented, and align with the goal of improving compiler optimization compatibility.

Fix gpt temporary patch for grpo to happen after compile

01ce386

gemini-code-assist Bot reviewed Mar 8, 2026

View reviewed changes

mmathew23 mentioned this pull request Mar 8, 2026

Fix gpt temporary patch for grpo to happen after compile unslothai/unsloth#4180

Merged

danielhanchen merged commit 834c035 into unslothai:main Mar 8, 2026

danielhanchen mentioned this pull request Mar 8, 2026

[Bug] Official 500K gpt-oss-20b notebook now fails with CUDA OOM unslothai/unsloth#4175

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix gpt temporary patch for grpo to happen after compile#536

Fix gpt temporary patch for grpo to happen after compile#536
danielhanchen merged 1 commit into
unslothai:mainfrom
mmathew23:fix/gptpatch

mmathew23 commented Mar 8, 2026

Uh oh!

gemini-code-assist Bot commented Mar 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mmathew23 commented Mar 8, 2026

Uh oh!

gemini-code-assist Bot commented Mar 8, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants