Skip to content

Fix gpt temporary patch for grpo to happen after compile#536

Merged
danielhanchen merged 1 commit into
unslothai:mainfrom
mmathew23:fix/gptpatch
Mar 8, 2026
Merged

Fix gpt temporary patch for grpo to happen after compile#536
danielhanchen merged 1 commit into
unslothai:mainfrom
mmathew23:fix/gptpatch

Conversation

@mmathew23

Copy link
Copy Markdown
Collaborator

The gpt oss grpo patch in temporary_patches runs 3 times. Since it runs before compiler, compiler doesnt match the causallm pattern so we lose the connection to unsloth fused loss. This PR proposes phases for temporary patches so patches can determine when they want to run.

Currently the way it's setup it will default to the current behavior unless the corresponding unsloth pr is merged as well.

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue where a GPT OSS GRPO patch was executing prematurely, hindering the compiler's ability to apply crucial fused loss optimizations. By implementing a new phasing mechanism, the patch can now be precisely timed to run after compilation, ensuring compatibility with compiler optimizations and maintaining efficient memory usage during training.

Highlights

  • Phased Patching: Introduced a phase argument to the patch_gpt_oss_for_grpo function, allowing the patch to be conditionally applied based on the specified execution phase.
  • Post-Compile Execution: Ensured the GPT OSS GRPO patch only runs post_compile to allow the compiler to correctly pattern-match cross-entropy and apply fused loss, preventing potential out-of-memory issues.
  • Prevent Duplicate Patching: Added a mechanism to check if GptOssForCausalLM has already been patched for GRPO, preventing the patch from being applied multiple times.
Changelog
  • unsloth_zoo/temporary_patches/gpt_oss.py
    • Modified the patch_gpt_oss_for_grpo function signature to accept a phase argument, defaulting to "post_compile".
    • Implemented a conditional return at the beginning of patch_gpt_oss_for_grpo to ensure the patch only proceeds if the current phase matches "post_compile".
    • Added a check using hasattr to prevent GptOssForCausalLM from being patched for GRPO more than once.
    • Set a _unsloth_grpo_patched attribute on GptOssForCausalLM to True after the GRPO patch is successfully applied.
Activity
  • No specific activity (comments, reviews, progress) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to control the timing of the gpt_oss GRPO patch application by introducing a phase parameter. The patch is now set to run only during the post_compile phase, which allows the compiler to perform optimizations like fused loss before the model's forward method is altered. Additionally, an idempotency check has been added to prevent the patch from being applied multiple times, addressing an issue where it was reportedly running three times. The changes are clear, well-implemented, and align with the goal of improving compiler optimization compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants